Xiang Wei

CV
h-index9
15papers
936citations
Novelty52%
AI Score59

15 Papers

CLFeb 20, 2023
ChatIE: Zero-Shot Information Extraction via Chatting with ChatGPT

Xiang Wei, Xingyu Cui, Ning Cheng et al.

Zero-shot information extraction (IE) aims to build IE systems from the unannotated text. It is challenging due to involving little human intervention. Challenging but worthwhile, zero-shot IE reduces the time and effort that data labeling takes. Recent efforts on large language models (LLMs, e.g., GPT-3, ChatGPT) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this work, we ask whether strong IE models can be constructed by directly prompting LLMs. Specifically, we transform the zero-shot IE task into a multi-turn question-answering problem with a two-stage framework (ChatIE). With the power of ChatGPT, we extensively evaluate our framework on three IE tasks: entity-relation triple extract, named entity recognition, and event extraction. Empirical results on six datasets across two languages show that ChatIE achieves impressive performance and even surpasses some full-shot models on several datasets (e.g., NYT11-HRL). We believe that our work could shed light on building IE models with limited resources.

CLJul 3, 2023
CollabKG: A Learnable Human-Machine-Cooperative Information Extraction Toolkit for (Event) Knowledge Graph Construction

Xiang Wei, Yufeng Chen, Ning Cheng et al.

In order to construct or extend entity-centric and event-centric knowledge graphs (KG and EKG), the information extraction (IE) annotation toolkit is essential. However, existing IE toolkits have several non-trivial problems, such as not supporting multi-tasks, not supporting automatic updates. In this work, we present CollabKG, a learnable human-machine-cooperative IE toolkit for KG and EKG construction. Specifically, for the multi-task issue, CollabKG unifies different IE subtasks, including named entity recognition (NER), entity-relation triple extraction (RE), and event extraction (EE), and supports both KG and EKG. Then, combining advanced prompting-based IE technology, the human-machine-cooperation mechanism with LLMs as the assistant machine is presented which can provide a lower cost as well as a higher performance. Lastly, owing to the two-way interaction between the human and machine, CollabKG with learning ability allows self-renewal. Besides, CollabKG has several appealing features (e.g., customization, training-free, propagation, etc.) that make the system powerful, easy-to-use, and high-productivity. We holistically compare our toolkit with other existing tools on these features. Human evaluation quantitatively illustrates that CollabKG significantly improves annotation quality, efficiency, and stability simultaneously.

43.1CVMay 15
Neutral-Reference Prompting for Vision-Language Models

Senmao Tian, Xiang Wei, Shunli Zhang

Efficient transfer learning of vision-language models (VLMs) commonly suffers from a Base-New Trade-off (BNT): improving performance on unseen (new) classes often degrades accuracy on known (base) classes. Addressing how to boost recognition of unseen classes without sacrificing known-class performance remains a central challenge. Existing work often simplistically attributes the BNT to overfitting on known classes. We observe an interesting phenomenon: VLMs frequently exhibit asymmetric confusion on certain downstream data, i.e., samples of class A are systematically mispredicted as class B, while the reverse confusion (B to A) rarely occurs. For known classes, this kind of bias can be mitigated by tuning using a cross-entropy loss, but for unseen classes, such pretraining-induced bias persists and harms generalization. Motivated by this, we propose NeRP, a plug-and-play prompting correction strategy that improves discrimination on unseen classes without modifying model parameters. NeRP leverages neutral text prompts and reference images to measure class-wise prior preferences along the pre-trained inter-class geometry, and combines them with the sample likelihood to obtain the model's surrogate score. If, for a given sample, the prior strongly favors the current prediction while the observed evidence is clearly insufficient, we perform a local flip between easily confusable class pairs, thereby correcting prior-dominated mispredictions. Extensive experiments across multiple backbones and 15 few-shot and cross-domain benchmarks show that NeRP substantially improves accuracy on unseen classes while preserving known-class prediction performance.

CVFeb 3, 2025Code
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models

Tongkun Liu, Bing Li, Xiao Jin et al.

Industrial defect segmentation is critical for manufacturing quality control. Due to the scarcity of training defect samples, few-shot semantic segmentation (FSS) holds significant value in this field. However, existing studies mostly apply FSS to tackle defects on simple textures, without considering more diverse scenarios. This paper aims to address this gap by exploring FSS in broader industrial products with various defect types. To this end, we contribute a new real-world dataset and reorganize some existing datasets to build a more comprehensive few-shot defect segmentation (FDS) benchmark. On this benchmark, we thoroughly investigate metric learning-based FSS methods, including those based on meta-learning and those based on Vision Foundation Models (VFMs). We observe that existing meta-learning-based methods are generally not well-suited for this task, while VFMs hold great potential. We further systematically study the applicability of various VFMs in this task, involving two paradigms: feature matching and the use of Segment Anything (SAM) models. We propose a novel efficient FDS method based on feature matching. Meanwhile, we find that SAM2 is particularly effective for addressing FDS through its video track mode. The contributed dataset and code will be available at: https://github.com/liutongkun/GFDS.

80.4NEMay 14
Advancing Direct Training for Spiking Neural Networks with Circulate-Firing Neurons and Learnable Gradients

Feifan Zhou, Xiang Wei, Yang Liu et al.

Spiking Neural Networks (SNNs) have emerged with promising energy-efficient property, yet a substantial performance gap persists compared to Artificial Neural Networks (ANNs). This gap stems from at least two key limitations: first, conventional spiking neurons offer limited information representation capacity, underutilizing the rich dynamics of membrane potentials; second, fixed surrogate gradient (SG) functions across time steps leads to imprecise gradient propagation, impeding effective direct training. To address these two challenges, we propose a new direct training algorithm with three core innovations: first, a circulate-firing spiking neuron model that enhances information representation capacity by leveraging membrane potentials more effectively; second, a time-step-wise learnable surrogate gradient function, enabling accurate gradient estimation during backpropagation; third, a positive-negative balanced loss function to achieve equilibrium between positive and negative membrane potentials and further boost SNN performance. Extensive experiments demonstrate that our methods achieve competitive performance across multiple datasets. Our methods can generalize seamlessly to advanced architectures of Transformer, consistently outperforming existing methods. Our work highlights the effectiveness of further harnessing intrinsic membrane dynamics of SNNs for performance improvement, and thus open a new avenue for advancing high-performance spiking neural architectures.

CLFeb 11, 2024
TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation

Peng Wang, Xiang Wei, Fangxu Hu et al.

Natural language processing (NLP) is a key component of intelligent transportation systems (ITS), but it faces many challenges in the transportation domain, such as domain-specific knowledge and data, and multi-modal inputs and outputs. This paper presents TransGPT, a novel (multi-modal) large language model for the transportation domain, which consists of two independent variants: TransGPT-SM for single-modal data and TransGPT-MM for multi-modal data. TransGPT-SM is finetuned on a single-modal Transportation dataset (STD) that contains textual data from various sources in the transportation domain. TransGPT-MM is finetuned on a multi-modal Transportation dataset (MTD) that we manually collected from three areas of the transportation domain: driving tests, traffic signs, and landmarks. We evaluate TransGPT on several benchmark datasets for different tasks in the transportation domain, and show that it outperforms baseline models on most tasks. We also showcase the potential applications of TransGPT for traffic analysis and modeling, such as generating synthetic traffic scenarios, explaining traffic phenomena, answering traffic-related questions, providing traffic recommendations, and generating traffic reports. This work advances the state-of-the-art of NLP in the transportation domain and provides a useful tool for ITS researchers and practitioners.

CVApr 23, 2024
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Jingyao Xu, Yuetong Lu, Yandong Li et al.

Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and outperforms baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.

LGDec 13, 2023
Levenshtein Distance Embedding with Poisson Regression for DNA Storage

Xiang Wei, Alan J. X. Guo, Sihan Sun et al.

Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural network-based sequence embedding technique using Poisson regression is proposed. We first provide a theoretical analysis of the impact of embedding dimension on model performance and present a criterion for selecting an appropriate embedding dimension. Under this embedding dimension, the Poisson regression is introduced by assuming the Levenshtein distance between sequences of fixed length following a Poisson distribution, which naturally aligns with the definition of Levenshtein distance. Moreover, from the perspective of the distribution of embedding distances, Poisson regression approximates the negative log likelihood of the chi-squared distribution and offers advancements in removing the skewness. Through comprehensive experiments on real DNA storage data, we demonstrate the superior performance of the proposed method compared to state-of-the-art approaches.

CVApr 9, 2025
LCGC: Learning from Consistency Gradient Conflicting for Class-Imbalanced Semi-Supervised Debiasing

Weiwei Xing, Yue Cheng, Hongzhu Yi et al.

Classifiers often learn to be biased corresponding to the class-imbalanced dataset, especially under the semi-supervised learning (SSL) set. While previous work tries to appropriately re-balance the classifiers by subtracting a class-irrelevant image's logit, but lacks a firm theoretical basis. We theoretically analyze why exploiting a baseline image can refine pseudo-labels and prove that the black image is the best choice. We also indicated that as the training process deepens, the pseudo-labels before and after refinement become closer. Based on this observation, we propose a debiasing scheme dubbed LCGC, which Learning from Consistency Gradient Conflicting, by encouraging biased class predictions during training. We intentionally update the pseudo-labels whose gradient conflicts with the debiased logits, representing the optimization direction offered by the over-imbalanced classifier predictions. Then, we debiased the predictions by subtracting the baseline image logits during testing. Extensive experiments demonstrate that LCGC can significantly improve the prediction accuracy of existing CISSL models on public benchmarks.

ITDec 20, 2023
DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel

Alan J. X. Guo, Sihan Sun, Xiang Wei et al.

With the emergence of new storage and communication methods, the insertion, deletion, and substitution (IDS) channel has attracted considerable attention. However, many topics on the IDS channel and the associated Levenshtein distance remain open, making the invention of a novel IDS-correcting code a hard task. Furthermore, current studies on single-IDS-correcting code misalign with the requirements of applications which necessitates the correcting of multiple errors. Compromise solutions have involved shortening codewords to reduce the chance of multiple errors. However, the code rates of existing codes are poor at short lengths, diminishing the overall storage density. In this study, a novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding. A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences. This embedding space serves as a proxy for the complex Levenshtein domain, within which algorithms for codeword search and segment correcting is developed. While the concept underpinning this approach is straightforward, it bypasses the mathematical challenges typically encountered in code design. The proposed method results in a code rate that outperforms existing combinatorial solutions, particularly for designing short-length codewords.

LGNov 24, 2025
Sampling Control for Imbalanced Calibration in Semi-Supervised Learning

Senmao Tian, Xiang Wei, Shunli Zhang

Class imbalance remains a critical challenge in semi-supervised learning (SSL), especially when distributional mismatches between labeled and unlabeled data lead to biased classification. Although existing methods address this issue by adjusting logits based on the estimated class distribution of unlabeled data, they often handle model imbalance in a coarse-grained manner, conflating data imbalance with bias arising from varying class-specific learning difficulties. To address this issue, we propose a unified framework, SC-SSL, which suppresses model bias through decoupled sampling control. During training, we identify the key variables for sampling control under ideal conditions. By introducing a classifier with explicit expansion capability and adaptively adjusting sampling probabilities across different data distributions, SC-SSL mitigates feature-level imbalance for minority classes. In the inference phase, we further analyze the weight imbalance of the linear classifier and apply post-hoc sampling control with an optimization bias vector to directly calibrate the logits. Extensive experiments across various benchmark datasets and distribution settings validate the consistency and state-of-the-art performance of SC-SSL.

CVAug 5, 2025
GaitAdapt: Continual Learning for Evolving Gait Recognition

Jingjie Wang, Shunli Zhang, Xiang Wei et al.

Current gait recognition methodologies generally necessitate retraining when encountering new datasets. Nevertheless, retrained models frequently encounter difficulties in preserving knowledge from previous datasets, leading to a significant decline in performance on earlier test sets. To tackle these challenges, we present a continual gait recognition task, termed GaitAdapt, which supports the progressive enhancement of gait recognition capabilities over time and is systematically categorized according to various evaluation scenarios. Additionally, we propose GaitAdapter, a non-replay continual learning approach for gait recognition. This approach integrates the GaitPartition Adaptive Knowledge (GPAK) module, employing graph neural networks to aggregate common gait patterns from current data into a repository constructed from graph vectors. Subsequently, this repository is used to improve the discriminability of gait features in new tasks, thereby enhancing the model's ability to effectively recognize gait patterns. We also introduce a Euclidean Distance Stability Method (EDSN) based on negative pairs, which ensures that newly added gait samples from different classes maintain similar relative spatial distributions across both previous and current gait tasks, thereby alleviating the impact of task changes on the distinguishability of original domain features. Extensive evaluations demonstrate that GaitAdapter effectively retains gait knowledge acquired from diverse tasks, exhibiting markedly superior discriminative capability compared to alternative methods.

NCAug 1, 2025
HetSyn: Versatile Timescale Integration in Spiking Neural Networks via Heterogeneous Synapses

Zhichao Deng, Zhikun Liu, Junxue Wang et al.

Spiking Neural Networks (SNNs) offer a biologically plausible and energy-efficient framework for temporal information processing. However, existing studies overlook a fundamental property widely observed in biological neurons-synaptic heterogeneity, which plays a crucial role in temporal processing and cognitive capabilities. To bridge this gap, we introduce HetSyn, a generalized framework that models synaptic heterogeneity with synapse-specific time constants. This design shifts temporal integration from the membrane potential to the synaptic current, enabling versatile timescale integration and allowing the model to capture diverse synaptic dynamics. We implement HetSyn as HetSynLIF, an extended form of the leaky integrate-and-fire (LIF) model equipped with synapse-specific decay dynamics. By adjusting the parameter configuration, HetSynLIF can be specialized into vanilla LIF neurons, neurons with threshold adaptation, and neuron-level heterogeneous models. We demonstrate that HetSynLIF not only improves the performance of SNNs across a variety of tasks-including pattern generation, delayed match-to-sample, speech recognition, and visual recognition-but also exhibits strong robustness to noise, enhanced working memory performance, efficiency under limited neuron resources, and generalization across timescales. In addition, analysis of the learned synaptic time constants reveals trends consistent with empirical observations in biological synapses. These findings underscore the significance of synaptic heterogeneity in enabling efficient neural computation, offering new insights into brain-inspired temporal modeling.

CVMay 6, 2023
Prompt What You Need: Enhancing Segmentation in Rainy Scenes with Anchor-based Prompting

Xiaoyu Guo, Xiang Wei, Qi Su et al.

Semantic segmentation in rainy scenes is a challenging task due to the complex environment, class distribution imbalance, and limited annotated data. To address these challenges, we propose a novel framework that utilizes semi-supervised learning and pre-trained segmentation foundation model to achieve superior performance. Specifically, our framework leverages the semi-supervised model as the basis for generating raw semantic segmentation results, while also serving as a guiding force to prompt pre-trained foundation model to compensate for knowledge gaps with entropy-based anchors. In addition, to minimize the impact of irrelevant segmentation masks generated by the pre-trained foundation model, we also propose a mask filtering and fusion mechanism that optimizes raw semantic segmentation results based on the principle of minimum risk. The proposed framework achieves superior segmentation performance on the Rainy WCity dataset and is awarded the first prize in the sub-track of STRAIN in ICME 2023 Grand Challenges.

CVMar 5, 2018
Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect

Xiang Wei, Boqing Gong, Zixia Liu et al.

Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax two-player training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1-Lipschitz continuity of the discriminator. In this paper, we propose a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs. Our approach seamlessly connects WGAN with one of the recent semi-supervised learning methods. As a result, it gives rise to not only better photo-realistic samples than the previous methods but also state-of-the-art semi-supervised learning results. In particular, our approach gives rise to the inception score of more than 5.0 with only 1,000 CIFAR-10 images and is the first that exceeds the accuracy of 90% on the CIFAR-10 dataset using only 4,000 labeled images, to the best of our knowledge.