Ben Niu

CV
h-index26
12papers
852citations
Novelty51%
AI Score51

12 Papers

CVJan 5
SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection

Xiantai Xiang, Guangyao Zhou, Zixiao Wen et al.

Multimodal object detection leveraging RGB and Infrared (IR) images is pivotal for robust perception in all-weather scenarios. While recent adapter-based approaches efficiently transfer RGB-pretrained foundation models to this task, they often prioritize model efficiency at the expense of cross-modal structural consistency. Consequently, critical structural cues are frequently lost when significant domain gaps arise, such as in high-contrast or nighttime environments. Moreover, conventional static multimodal fusion mechanisms typically lack environmental awareness, resulting in suboptimal adaptation and constrained detection performance under complex, dynamic scene variations. To address these limitations, we propose SLGNet, a parameter-efficient framework that synergizes hierarchical structural priors and language-guided modulation within a frozen Vision Transformer (ViT)-based foundation model. Specifically, we design a Structure-Aware Adapter to extract hierarchical structural representations from both modalities and dynamically inject them into the ViT to compensate for structural degradation inherent in ViT-based backbones. Furthermore, we propose a Language-Guided Modulation module that exploits VLM-driven structured captions to dynamically recalibrate visual features, thereby endowing the model with robust environmental awareness. Extensive experiments on the LLVIP, FLIR, KAIST, and DroneVehicle datasets demonstrate that SLGNet establishes new state-of-the-art performance. Notably, on the LLVIP benchmark, our method achieves an mAP of 66.1, while reducing trainable parameters by approximately 87% compared to traditional full fine-tuning. This confirms SLGNet as a robust and efficient solution for multimodal perception.

LGJun 6, 2023
Decoding Virtual Healthcare Success through Knowledge-Aware and Multimodal Predictive Modeling

Shuang Geng, Wenli Zhang, Jiaheng Xie et al.

Online healthcare consultations have transformed how patients seek medical advice, offering convenience while introducing new challenges for ensuring consultation success. Predicting whether an online consultation will be successful is critical for improving patient experiences and sustaining platform competitiveness. Yet, such prediction is inherently difficult due to the fragmented nature of patients' care journeys and the lack of integration between virtual and traditional healthcare systems. Furthermore, the data collected from online platforms, including textual conversations, interaction sequences, and behavioral traces, are often sparse and incomplete. This study develops a predictive modeling approach that fuses multimodal data and dynamically constructed knowledge networks to capture latent relationships among patients, physicians, and consultation contexts. By integrating heterogeneous information sources and uncovering the evolving structure of digital interactions, the model enhances the accuracy and interpretability of consultation success prediction. The findings offer implications for designing hybrid healthcare ecosystems that combine online and offline services through data-driven intelligence.

CVSep 24, 2024
Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?

Likun Zhang, Hao Wu, Lingcui Zhang et al.

The emergence of text-to-image models has recently sparked significant interest, but the attendant is a looming shadow of potential infringement by violating the user terms. Specifically, an adversary may exploit data created by a commercial model to train their own without proper authorization. To address such risk, it is crucial to investigate the attribution of a suspicious model's training data by determining whether its training data originates, wholly or partially, from a specific source model. To trace the generated data, existing methods require applying extra watermarks during either the training or inference phases of the source model. However, these methods are impractical for pre-trained models that have been released, especially when model owners lack security expertise. To tackle this challenge, we propose an injection-free training data attribution method for text-to-image models. It can identify whether a suspicious model's training data stems from a source model, without additional modifications on the source model. The crux of our method lies in the inherent memorization characteristic of text-to-image models. Our core insight is that the memorization of the training dataset is passed down through the data generated by the source model to the model trained on that data, making the source model and the infringing model exhibit consistent behaviors on specific samples. Therefore, our approach involves developing algorithms to uncover these distinct samples and using them as inherent watermarks to verify if a suspicious model originates from the source model. Our experiments demonstrate that our method achieves an accuracy of over 80\% in identifying the source of a suspicious model's training data, without interfering the original training or generation process of the source model.

HCAug 30, 2024
Exploring User Acceptance Of Portable Intelligent Personal Assistants: A Hybrid Approach Using PLS-SEM And fsQCA

Gustave Florentin Nkoulou Mvondo, Ben Niu

This research explores the factors driving user acceptance of Rabbit R1, a newly developed portable intelligent personal assistant (PIPA) that aims to redefine user interaction and control. The study extends the technology acceptance model (TAM) by incorporating artificial intelligence-specific factors (conversational intelligence, task intelligence, and perceived naturalness), user interface design factors (simplicity in information design and visual aesthetics), and user acceptance and loyalty. Using a purposive sampling method, we gathered data from 824 users in the US and analyzed the sample through partial least squares structural equation modeling (PLS-SEM) and fuzzy set qualitative comparative analysis (fsQCA). The findings reveal that all hypothesized relationships, including both direct and indirect effects, are supported. Additionally, fsQCA supports the PLS-SEM findings and identifies three configurations leading to high and low user acceptance. This research enriches the literature and provides valuable insights for system designers and marketers of PIPAs, guiding strategic decisions to foster widespread adoption and long-term engagement.

CVJan 7
GeoReason: Aligning Thinking And Answering In Remote Sensing Vision-Language Models Via Logical Consistency Reinforcement Learning

Wenshuai Li, Xiantai Xiang, Zixiao Wen et al.

The evolution of Remote Sensing Vision-Language Models(RS-VLMs) emphasizes the importance of transitioning from perception-centric recognition toward high-level deductive reasoning to enhance cognitive reliability in complex spatial tasks. However, current models often suffer from logical hallucinations, where correct answers are derived from flawed reasoning chains or rely on positional shortcuts rather than spatial logic. This decoupling undermines reliability in strategic spatial decision-making. To address this, we present GeoReason, a framework designed to synchronize internal thinking with final decisions. We first construct GeoReason-Bench, a logic-driven dataset containing 4,000 reasoning trajectories synthesized from geometric primitives and expert knowledge. We then formulate a two-stage training strategy: (1) Supervised Knowledge Initialization to equip the model with reasoning syntax and domain expertise, and (2) Consistency-Aware Reinforcement Learning to refine deductive reliability. This second stage integrates a novel Logical Consistency Reward, which penalizes logical drift via an option permutation strategy to anchor decisions in verifiable reasoning traces. Experimental results demonstrate that our framework significantly enhances the cognitive reliability and interpretability of RS-VLMs, achieving state-of-the-art performance compared to other advanced methods.

CLSep 5, 2024
Bypassing DARCY Defense: Indistinguishable Universal Adversarial Triggers

Zuquan Peng, Yuanyuan He, Jianbing Ni et al.

Neural networks (NN) classification models for Natural Language Processing (NLP) are vulnerable to the Universal Adversarial Triggers (UAT) attack that triggers a model to produce a specific prediction for any input. DARCY borrows the "honeypot" concept to bait multiple trapdoors, effectively detecting the adversarial examples generated by UAT. Unfortunately, we find a new UAT generation method, called IndisUAT, which produces triggers (i.e., tokens) and uses them to craft adversarial examples whose feature distribution is indistinguishable from that of the benign examples in a randomly-chosen category at the detection layer of DARCY. The produced adversarial examples incur the maximal loss of predicting results in the DARCY-protected models. Meanwhile, the produced triggers are effective in black-box models for text generation, text inference, and reading comprehension. Finally, the evaluation results under NN models for NLP tasks indicate that the IndisUAT method can effectively circumvent DARCY and penetrate other defenses. For example, IndisUAT can reduce the true positive rate of DARCY's detection by at least 40.8% and 90.6%, and drop the accuracy by at least 33.3% and 51.6% in the RNN and CNN models, respectively. IndisUAT reduces the accuracy of the BERT's adversarial defense model by at least 34.0%, and makes the GPT-2 language model spew racist outputs even when conditioned on non-racial context.

AIMay 7, 2024
Factors Influencing User Willingness To Use SORA

Gustave Florentin Nkoulou Mvondo, Ben Niu

Sora promises to redefine the way visual content is created. Despite its numerous forecasted benefits, the drivers of user willingness to use the text-to-video (T2V) model are unknown. This study extends the extended unified theory of acceptance and use of technology (UTAUT2) with perceived realism and novelty value. Using a purposive sampling method, we collected data from 940 respondents in the US and analyzed the sample using covariance-based structural equation modeling and fuzzy set qualitative comparative analysis (fsQCA). The findings reveal that all hypothesized relationships are supported, with perceived realism emerging as the most influential driver, followed by novelty value. Moreover, fsQCA identifies five configurations leading to high and low willingness to use, and the model demonstrates high predictive validity, contributing to theory advancement. Our study provides valuable insights for developers and marketers, offering guidance for strategic decisions to promote the widespread adoption of T2V models.

CLJun 4, 2025
An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals

Yangyang Zhao, Ben Niu, Libo Qin et al.

Deep Reinforcement Learning (DRL) is widely used in task-oriented dialogue systems to optimize dialogue policy, but it struggles to balance exploration and exploitation due to the high dimensionality of state and action spaces. This challenge often results in local optima or poor convergence. Evolutionary Algorithms (EAs) have been proven to effectively explore the solution space of neural networks by maintaining population diversity. Inspired by this, we innovatively combine the global search capabilities of EA with the local optimization of DRL to achieve a balance between exploration and exploitation. Nevertheless, the inherent flexibility of natural language in dialogue tasks complicates this direct integration, leading to prolonged evolutionary times. Thus, we further propose an elite individual injection mechanism to enhance EA's search efficiency by adaptively introducing best-performing individuals into the population. Experiments across four datasets show that our approach significantly improves the balance between exploration and exploitation, boosting performance. Moreover, the effectiveness of the EII mechanism in reducing exploration time has been demonstrated, achieving an efficient integration of EA and DRL on task-oriented dialogue policy tasks.

LGSep 27, 2025
DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence

Yang Lv, Jin Cao, Ben Niu et al.

The Sixth-Generation (6G) network envisions pervasive artificial intelligence (AI) as a core goal, enabled by edge intelligence through on-device data utilization. To realize this vision, federated learning (FL) has emerged as a key paradigm for collaborative training across edge devices. However, the sensitivity and heterogeneity of edge data pose key challenges to FL: parameter sharing risks data reconstruction, and a unified global model struggles to adapt to diverse local distributions. In this paper, we propose a novel federated learning framework that integrates personalized differential privacy (DP) and adaptive model design. To protect training data, we leverage sample-level representations for knowledge sharing and apply a personalized DP strategy to resist reconstruction attacks. To ensure distribution-aware adaptation under privacy constraints, we develop a privacy-aware neural architecture search (NAS) algorithm that generates locally customized architectures and hyperparameters. To the best of our knowledge, this is the first personalized DP solution tailored for representation-based FL with theoretical convergence guarantees. Our scheme achieves strong privacy guarantees for training data while significantly outperforming state-of-the-art methods in model performance. Experiments on benchmark datasets such as CIFAR-10 and CIFAR-100 demonstrate that our scheme improves accuracy by 6.82\% over the federated NAS method PerFedRLNAS, while reducing model size to 1/10 and communication cost to 1/20.

IVAug 20, 2020
Single Image Super-Resolution via a Holistic Attention Network

Ben Niu, Weilei Wen, Wenqi Ren et al.

Informative features play a crucial role in the single image super-resolution task. Channel attention has been demonstrated to be effective for preserving information-rich features in each layer. However, channel attention treats each convolution layer as a separate process that misses the correlation among different layers. To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions. Specifically, the proposed LAM adaptively emphasizes hierarchical features by considering correlations among layers. Meanwhile, CSAM learns the confidence at all the positions of each channel to selectively capture more informative features. Extensive experiments demonstrate that the proposed HAN performs favorably against the state-of-the-art single image super-resolution approaches.

LGFeb 20, 2020
Input Perturbation: A New Paradigm between Central and Local Differential Privacy

Yilin Kang, Yong Liu, Ben Niu et al.

Traditionally, there are two models on differential privacy: the central model and the local model. The central model focuses on the machine learning model and the local model focuses on the training data. In this paper, we study the \textit{input perturbation} method in differentially private empirical risk minimization (DP-ERM), preserving privacy of the central model. By adding noise to the original training data and training with the `perturbed data', we achieve ($ε$,$δ$)-differential privacy on the final model, along with some kind of privacy on the original data. We observe that there is an interesting connection between the local model and the central model: the perturbation on the original data causes the perturbation on the gradient, and finally the model parameters. This observation means that our method builds a bridge between local and central model, protecting the data, the gradient and the model simultaneously, which is more superior than previous central methods. Detailed theoretical analysis and experiments show that our method achieves almost the same (or even better) performance as some of the best previous central methods with more protections on privacy, which is an attractive result. Moreover, we extend our method to a more general case: the loss function satisfies the Polyak-Lojasiewicz condition, which is more general than strong convexity, the constraint on the loss function in most previous work.

CRJan 31, 2014
Priority-Aware Private Matching Schemes for Proximity-Based Mobile Social Networks

Ben Niu, Tanran Zhang, Xiaoyan Zhu et al.

The rapid developments of mobile devices and online social networks have resulted in increasing attention to Mobile Social Networking (MSN). The explosive growth of mobile-connected and location-aware devices makes it possible and meaningful to do the Proximity-based Mobile Social Networks (PMSNs). Users can discover and make new social interactions easily with physical-proximate mobile users through WiFi/Bluetooth interfaces embedded in their smartphones. However, users enjoy these conveniences at the cost of their growing privacy concerns. To address this problem, we propose a suit of priority-aware private matching schemes to privately match the similarity with potential friends in the vicinity. Unlike most existing work, our proposed priority-aware matching scheme (P-match) achieves the privacy goal by combining the commutative encryption function and the Tanimoto similarity coefficient which considers both the number of common attributes between users as well as the corresponding priorities on each common attribute. Further, based on the newly constructed similarity function which takes the ratio of attributes matched over all the input set into consideration, we design an enhanced version to deal with some potential attacks such as unlimitedly inputting the attribute set on either the initiator side or the responder side, etc. Finally, our proposed E-match avoids the heavy cryptographic operations and improves the system performance significantly by employing a novel use of the Bloom filter. The security and communication/computation overhead of our schemes are thoroughly analyzed and evaluated via detailed simulations and implementation.