77.5CVApr 17Code
Beyond Text Prompts: Precise Concept Erasure through Text-Image CollaborationJun Li, Lizhi Xiong, Ziqiang Li et al.
Text-to-image generative models have achieved impressive fidelity and diversity, but can inadvertently produce unsafe or undesirable content due to implicit biases embedded in large-scale training datasets. Existing concept erasure methods, whether text-only or image-assisted, face trade-offs: textual approaches often fail to fully suppress concepts, while naive image-guided methods risk over-erasing unrelated content. We propose TICoE, a text-image Collaborative Erasing framework that achieves precise and faithful concept removal through a continuous convex concept manifold and hierarchical visual representation learning. TICoE precisely removes target concepts while preserving unrelated semantic and visual content. To objectively assess the quality of erasure, we further introduce a fidelity-oriented evaluation strategy that measures post-erasure usability. Experiments on multiple benchmarks show that TICoE surpasses prior methods in concept removal precision and content fidelity, enabling safer, more controllable text-to-image generation. Our code is available at https://github.com/OpenAscent-L/TICoE.git
CVOct 14, 2023
TS-ENAS:Two-Stage Evolution for Cell-based Network Architecture SearchJuan Zou, Shenghong Wu, Yizhang Xia et al.
Neural network architecture search provides a solution to the automatic design of network structures. However, it is difficult to search the whole network architecture directly. Although using stacked cells to search neural network architectures is an effective way to reduce the complexity of searching, these methods do not able find the global optimal neural network structure since the number of layers, cells and connection methods is fixed. In this paper, we propose a Two-Stage Evolution for cell-based Network Architecture Search(TS-ENAS), including one-stage searching based on stacked cells and second-stage adjusting these cells. In our algorithm, a new cell-based search space and an effective two-stage encoding method are designed to represent cells and neural network structures. In addition, a cell-based weight inheritance strategy is designed to initialize the weight of the network, which significantly reduces the running time of the algorithm. The proposed methods are extensively tested and compared on four image classification dataset, Fashion-MNIST, CIFAR10, CIFAR100 and ImageNet and compared with 22 state-of-the-art algorithms including hand-designed networks and NAS networks. The experimental results show that TS-ENAS can more effectively find the neural network architecture with comparative performance.
CVMay 18, 2025Code
Is Artificial Intelligence Generated Image Detection a Solved Problem?Ziqiang Li, Jiazhen Yan, Ziwen He et al.
The rapid advancement of generative models, such as GANs and Diffusion models, has enabled the creation of highly realistic synthetic images, raising serious concerns about misinformation, deepfakes, and copyright infringement. Although numerous Artificial Intelligence Generated Image (AIGI) detectors have been proposed, often reporting high accuracy, their effectiveness in real-world scenarios remains questionable. To bridge this gap, we introduce AIGIBench, a comprehensive benchmark designed to rigorously evaluate the robustness and generalization capabilities of state-of-the-art AIGI detectors. AIGIBench simulates real-world challenges through four core tasks: multi-source generalization, robustness to image degradation, sensitivity to data augmentation, and impact of test-time pre-processing. It includes 23 diverse fake image subsets that span both advanced and widely adopted image generation techniques, along with real-world samples collected from social media and AI art platforms. Extensive experiments on 11 advanced detectors demonstrate that, despite their high reported accuracy in controlled settings, these detectors suffer significant performance drops on real-world data, limited benefits from common augmentations, and nuanced effects of pre-processing, highlighting the need for more robust detection strategies. By providing a unified and realistic evaluation framework, AIGIBench offers valuable insights to guide future research toward dependable and generalizable AIGI detection.Data and code are publicly available at: https://github.com/HorizonTEL/AIGIBench.
68.6HCMay 7
From Fixed to Flexible: Shaping AI Personality in Context-Sensitive InteractionShakyani Jayasiriwardene, Hongyu Zhou, Weiwei Jiang et al.
Conversational agents are increasingly expected to adapt across contexts and evolve their personalities through interactions, yet most remain static once configured. We present an exploratory study of how user expectations form and evolve when agent personality is made dynamically adjustable. To investigate this, we designed a prototype conversational interface that enabled users to adjust an agent's personality along eight research-grounded dimensions across three task contexts: informational, emotional, and appraisal. We conducted an online mixed-methods study with 60 participants, employing latent profile analysis to characterize personality classes and trajectory analysis to trace evolving patterns of personality adjustment. These approaches revealed distinct personality profiles at initial and final configuration stages, and adjustment trajectories, shaped by context-sensitivity. Participants also valued the autonomy, perceived the agent as more anthropomorphic, and reported greater trust. Our findings highlight the importance of designing conversational agents that adapt alongside their users, advancing more responsive and human-centred AI.
CVAug 2, 2025
NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image DetectionJiazhen Yan, Fan Wang, Weiwei Jiang et al.
The rapid progress of generative models, such as GANs and diffusion models, has facilitated the creation of highly realistic images, raising growing concerns over their misuse in security-sensitive domains. While existing detectors perform well under known generative settings, they often fail to generalize to unknown generative models, especially when semantic content between real and fake images is closely aligned. In this paper, we revisit the use of CLIP features for AI-generated image detection and uncover a critical limitation: the high-level semantic information embedded in CLIP's visual features hinders effective discrimination. To address this, we propose NS-Net, a novel detection framework that leverages NULL-Space projection to decouple semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images. Furthermore, we design a Patch Selection strategy to preserve fine-grained artifacts by mitigating semantic bias caused by global image structures. Extensive experiments on an open-world benchmark comprising images generated by 40 diverse generative models show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4\% improvement in detection accuracy, thereby demonstrating strong generalization across both GAN- and diffusion-based image generation techniques.
CVDec 16, 2024
SitPose: Real-Time Detection of Sitting Posture and Sedentary Behavior Using Ensemble Learning With Depth SensorHang Jin, Xin He, Lingyun Wang et al.
Poor sitting posture can lead to various work-related musculoskeletal disorders (WMSDs). Office employees spend approximately 81.8% of their working time seated, and sedentary behavior can result in chronic diseases such as cervical spondylosis and cardiovascular diseases. To address these health concerns, we present SitPose, a sitting posture and sedentary detection system utilizing the latest Kinect depth camera. The system tracks 3D coordinates of bone joint points in real-time and calculates the angle values of related joints. We established a dataset containing six different sitting postures and one standing posture, totaling 33,409 data points, by recruiting 36 participants. We applied several state-of-the-art machine learning algorithms to the dataset and compared their performance in recognizing the sitting poses. Our results show that the ensemble learning model based on the soft voting mechanism achieves the highest F1 score of 98.1%. Finally, we deployed the SitPose system based on this ensemble model to encourage better sitting posture and to reduce sedentary habits.
LGDec 13, 2024
Real-Time Fall Detection Using Smartphone Accelerometers and WiFi Channel State InformationLingyun Wang, Deqi Su, Aohua Zhang et al.
In recent years, as the population ages, falls have increasingly posed a significant threat to the health of the elderly. We propose a real-time fall detection system that integrates the inertial measurement unit (IMU) of a smartphone with optimized Wi-Fi channel state information (CSI) for secondary validation. Initially, the IMU distinguishes falls from routine daily activities with minimal computational demand. Subsequently, the CSI is employed for further assessment, which includes evaluating the individual's post-fall mobility. This methodology not only achieves high accuracy but also reduces energy consumption in the smartphone platform. An Android application developed specifically for the purpose issues an emergency alert if the user experiences a fall and is unable to move. Experimental results indicate that the CSI model, based on convolutional neural networks (CNN), achieves a detection accuracy of 99%, \revised{surpassing comparable IMU-only models, and demonstrating significant resilience in distinguishing between falls and non-fall activities.
CVMar 24, 2025
CalFuse: Multi-Modal Continual Learning via Feature Calibration and Parameter FusionJuncen Guo, Siao Liu, Xiaoguang Zhu et al.
With the proliferation of multi-modal data in large-scale visual recognition systems, enabling models to continuously acquire knowledge from evolving data streams while preserving prior information has become increasingly critical. Class-Continual Learning (CCL) addresses this challenge by incrementally incorporating new class knowledge without revisiting historical data, making it essential for real-world big data applications. While traditional CCL methods rely solely on visual features, recent advances in Vision-Language Models (VLMs) such as CLIP demonstrate significant potential for CCL by leveraging pre-trained multi-modal knowledge. However, existing approaches face challenges in mitigating catastrophic forgetting while maintaining the cross-modal generalization capabilities of VLMs. To address these limitations, we propose CalFuse, a framework that synergizes feature Calibration with parameter Fusion to enable effective multi-modal knowledge integration in continual learning scenarios. CalFuse introduces a dynamic feature calibration mechanism that adaptively balances original CLIP visual representations with task-specific features, preserving the model's intrinsic cross-modal generalization while adapting to new classes. Concurrently, a QR decomposition-based parameter fusion strategy progressively integrates newly acquired knowledge with historical task parameters, maintaining equilibrium between learning new class representations and retaining prior knowledge across sequential tasks. Extensive experiments on benchmark datasets validate the effectiveness of our approach in large-scale multi-modal continual learning settings, demonstrating superior performance over state-of-the-art methods in both average accuracy and final task retention.
HCDec 11, 2021
UbiNIRS: A Software Framework for Miniaturized NIRS-based ApplicationsWeiwei Jiang, Zhanna Sarsenbayeva, Difeng Yu et al.
We present UbiNIRS, a software framework for rapid development and deployment of applications using miniaturized near-infrared spectroscopy (NIRS). NIRS is an emerging material sensing technology that has shown a great potential in recent work from the HCI community such as in situ pill testing. However, existing methods require significant programming efforts and professional knowledge of NIRS, and hence, challenge the creation of new NIRS based applications. Our system helps to resolve this issue by providing a generic server and a mobile app, using the best practices for NIRS applications in literature. The server creates and manages UbiNIRS instances without the need for any coding or professional knowledge of NIRS. The mobile app can register multiple UbiNIRS instances by communicating with the server for different NIRS based applications. Furthermore, UbiNIRS enables NIRS spectrum crowdsourcing for building a knowledge base.
HCDec 1, 2021
InfoPrint: Embedding Information into 3D Printed ObjectsWeiwei Jiang, Chaofan Wang, Zhanna Sarsenbayeva et al.
We present a technique to embed information invisible to the eye inside 3D printed objects. The information is integrated in the object model, and then fabricated using off-the-shelf dual-head FDM (Fused Deposition Modeling) 3D printers. Our process does not require human intervention during or after printing with the integrated model. The information can be arbitrary symbols, such as icons, text,binary, or handwriting. To retrieve the information, we evaluate two different infrared-based imaging devices that are readily available-thermal cameras and near-infrared scanners. Based on our results, we propose design guidelines for a range of use cases to embed and extract hidden information. We demonstrate how our method can be used for different applications, such as interactive thermal displays, hidden board game tokens, tagging functional printed objects, and autographing non-fungible fabrication work.
AIJul 26, 2021
3D AGSE-VNet: An Automatic Brain Tumor MRI Data Segmentation FrameworkXi Guan, Guang Yang, Jianming Ye et al.
Background: Glioma is the most common brain malignant tumor, with a high morbidity rate and a mortality rate of more than three percent, which seriously endangers human health. The main method of acquiring brain tumors in the clinic is MRI. Segmentation of brain tumor regions from multi-modal MRI scan images is helpful for treatment inspection, post-diagnosis monitoring, and effect evaluation of patients. However, the common operation in clinical brain tumor segmentation is still manual segmentation, lead to its time-consuming and large performance difference between different operators, a consistent and accurate automatic segmentation method is urgently needed. Methods: To meet the above challenges, we propose an automatic brain tumor MRI data segmentation framework which is called AGSE-VNet. In our study, the Squeeze and Excite (SE) module is added to each encoder, the Attention Guide Filter (AG) module is added to each decoder, using the channel relationship to automatically enhance the useful information in the channel to suppress the useless information, and use the attention mechanism to guide the edge information and remove the influence of irrelevant information such as noise. Results: We used the BraTS2020 challenge online verification tool to evaluate our approach. The focus of verification is that the Dice scores of the whole tumor (WT), tumor core (TC) and enhanced tumor (ET) are 0.68, 0.85 and 0.70, respectively. Conclusion: Although MRI images have different intensities, AGSE-VNet is not affected by the size of the tumor, and can more accurately extract the features of the three regions, it has achieved impressive results and made outstanding contributions to the clinical diagnosis and treatment of brain tumor patients.
LGJul 6, 2021
An Evaluation of Machine Learning and Deep Learning Models for Drought Prediction using Weather DataWeiwei Jiang, Jiayun Luo
Drought is a serious natural disaster that has a long duration and a wide range of influence. To decrease the drought-caused losses, drought prediction is the basis of making the corresponding drought prevention and disaster reduction measures. While this problem has been studied in the literature, it remains unknown whether drought can be precisely predicted or not with machine learning models using weather data. To answer this question, a real-world public dataset is leveraged in this study and different drought levels are predicted using the last 90 days of 18 meteorological indicators as the predictors. In a comprehensive approach, 16 machine learning models and 16 deep learning models are evaluated and compared. The results show no single model can achieve the best performance for all evaluation metrics simultaneously, which indicates the drought prediction problem is still challenging. As benchmarks for further studies, the code and results are publicly available in a Github repository.
NIJun 4, 2021
Graph-based Deep Learning for Communication Networks: A SurveyWeiwei Jiang
Communication networks are important infrastructures in contemporary society. There are still many challenges that are not fully solved and new solutions are proposed continuously in this active research area. In recent years, to model the network topology, graph-based deep learning has achieved the state-of-the-art performance in a series of problems in communication networks. In this survey, we review the rapidly growing body of research using different graph-based deep learning models, e.g. graph convolutional and graph attention networks, in various problems from different types of communication networks, e.g. wireless networks, wired networks, and software defined networks. We also present a well-organized list of the problem and solution for each study and identify future research directions. To the best of our knowledge, this paper is the first survey that focuses on the application of graph-based deep learning methods in communication networks involving both wired and wireless scenarios. To track the follow-up research, a public GitHub repository is created, where the relevant papers will be updated continuously.
LGMar 18, 2021
Big Data for Traffic Estimation and Prediction: A Survey of Data and ToolsWeiwei Jiang, Jiayun Luo
Big data has been used widely in many areas including the transportation industry. Using various data sources, traffic states can be well estimated and further predicted for improving the overall operation efficiency. Combined with this trend, this study presents an up-to-date survey of open data and big data tools used for traffic estimation and prediction. Different data types are categorized and the off-the-shelf tools are introduced. To further promote the use of big data for traffic estimation and prediction tasks, challenges and future directions are given for future studies.
LGJan 27, 2021
Graph Neural Network for Traffic Forecasting: A SurveyWeiwei Jiang, Jiayun Luo
Traffic forecasting is important for the success of intelligent transportation systems. Deep learning models, including convolution neural networks and recurrent neural networks, have been extensively applied in traffic forecasting problems to model spatial and temporal dependencies. In recent years, to model the graph structures in transportation systems as well as contextual information, graph neural networks have been introduced and have achieved state-of-the-art performance in a series of traffic forecasting problems. In this survey, we review the rapidly growing body of research using different graph neural networks, e.g. graph convolutional and graph attention networks, in various traffic forecasting problems, e.g. road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, and demand forecasting in ride-hailing platforms. We also present a comprehensive list of open data and source resources for each problem and identify future research directions. To the best of our knowledge, this paper is the first comprehensive survey that explores the application of graph neural networks for traffic forecasting problems. We have also created a public GitHub repository where the latest papers, open data, and source resources will be updated.
CVApr 8, 2020
MNIST-MIX: A Multi-language Handwritten Digit Recognition DatasetWeiwei Jiang
In this letter, we contribute a multi-language handwritten digit recognition dataset named MNIST-MIX, which is the largest dataset of the same type in terms of both languages and data samples. With the same data format with MNIST, MNIST-MIX can be seamlessly applied in existing studies for handwritten digit recognition. By introducing digits from 10 different languages, MNIST-MIX becomes a more challenging dataset and its imbalanced classification requires a better design of models. We also present the results of applying a LeNet model which is pre-trained on MNIST as the baseline.
STFeb 29, 2020
Applications of deep learning in stock market prediction: recent progressWeiwei Jiang
Stock market prediction has been a classical yet challenging problem, with the attention from both economists and computer scientists. With the purpose of building an effective prediction model, both linear and machine learning tools have been explored for the past couple of decades. Lately, deep learning models have been introduced as new frontiers for this topic and the rapid development is too fast to catch up. Hence, our motivation for this survey is to give a latest review of recent works on deep learning models for stock market prediction. We not only category the different data sources, various neural network structures, and common used evaluation metrics, but also the implementation and reproducibility. Our goal is to help the interested researchers to synchronize with the latest progress and also help them to easily reproduce the previous studies as baselines. Base on the summary, we also highlight some future research directions in this topic.