CLApr 16, 2023Code
ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital HumanJunfeng Tian, Hehong Chen, Guohai Xu et al.
In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. Different from other open-domain dialogue models that focus on large-scale pre-training and scaling up model size or dialogue corpus, we aim to build a powerful and practical dialogue system for digital human with diverse skills and good multi-task generalization by internet-augmented instruction tuning. To this end, we first conduct large-scale pre-training on both common document corpus and dialogue data with curriculum learning, so as to inject various world knowledge and dialogue abilities into ChatPLUG. Then, we collect a wide range of dialogue tasks spanning diverse features of knowledge, personality, multi-turn memory, and empathy, on which we further instruction tune \modelname via unified natural language instruction templates. External knowledge from an internet search is also used during instruction finetuning for alleviating the problem of knowledge hallucinations. We show that \modelname outperforms state-of-the-art Chinese dialogue systems on both automatic and human evaluation, and demonstrates strong multi-task generalization on a variety of text understanding and generation tasks. In addition, we deploy \modelname to real-world applications such as Smart Speaker and Instant Message applications with fast inference. Our models and code will be made publicly available on ModelScope: https://modelscope.cn/models/damo/ChatPLUG-3.7B and Github: https://github.com/X-PLUG/ChatPLUG .
CVJul 4, 2023
Synchronous Image-Label Diffusion Probability Model with Application to Stroke Lesion Segmentation on Non-contrast CTJianhai Zhang, Tonghua Wan, Ethan MacDonald et al.
Stroke lesion volume is a key radiologic measurement for assessing the prognosis of Acute Ischemic Stroke (AIS) patients, which is challenging to be automatically measured on Non-Contrast CT (NCCT) scans. Recent diffusion probabilistic models have shown potentials of being used for image segmentation. In this paper, a novel Synchronous image-label Diffusion Probability Model (SDPM) is proposed for stroke lesion segmentation on NCCT using Markov diffusion process. The proposed SDPM is fully based on a Latent Variable Model (LVM), offering a complete probabilistic elaboration. An additional net-stream, parallel with a noise prediction stream, is introduced to obtain initial noisy label estimates for efficiently inferring the final labels. By optimizing the specified variational boundaries, the trained model can infer multiple label estimates for reference given the input images with noises. The proposed model was assessed on three stroke lesion datasets including one public and two private datasets. Compared to several U-net and transformer-based segmentation methods, our proposed SDPM model is able to achieve state-of-the-art performance. The code is publicly available.
CLApr 11, 2022
MGIMN: Multi-Grained Interactive Matching Network for Few-shot Text ClassificationJianhai Zhang, Mieradilijiang Maimaiti, Xing Gao et al.
Text classification struggles to generalize to unseen classes with very few labeled text instances per class. In such a few-shot learning (FSL) setting, metric-based meta-learning approaches have shown promising results. Previous studies mainly aim to derive a prototype representation for each class. However, they neglect that it is challenging-yet-unnecessary to construct a compact representation which expresses the entire meaning for each class. They also ignore the importance to capture the inter-dependency between query and the support set for few-shot text classification. To deal with these issues, we propose a meta-learning based method MGIMN which performs instance-wise comparison followed by aggregation to generate class-wise matching vectors instead of prototype learning. The key of instance-wise comparison is the interactive matching within the class-specific context and episode-specific context. Extensive experiments demonstrate that the proposed method significantly outperforms the existing state-of-the-art approaches, under both the standard FSL and generalized FSL settings.
CVJan 29, 2024
Adversarial Training on Purification (AToP): Advancing Both Robustness and GeneralizationGuang Lin, Chao Li, Jianhai Zhang et al.
The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robustness. Meanwhile, both methods share one common limitation on the degraded standard accuracy. To mitigate these issues, we propose a novel pipeline to acquire the robust purifier model, named Adversarial Training on Purification (AToP), which comprises two components: perturbation destruction by random transforms (RT) and purifier model fine-tuned (FT) by adversarial loss. RT is essential to avoid overlearning to known attacks, resulting in the robustness generalization to unseen attacks, and FT is essential for the improvement of robustness. To evaluate our method in an efficient and scalable way, we conduct extensive experiments on CIFAR-10, CIFAR-100, and ImageNette to demonstrate that our method achieves optimal robustness and exhibits generalization ability against unseen attacks.
CVMar 24, 2024
Adversarial Guided Diffusion Models for Adversarial PurificationGuang Lin, Zerui Tao, Jianhai Zhang et al.
Diffusion model (DM) based adversarial purification (AP) has proven to be a powerful defense method that can remove adversarial perturbations and generate a purified example without threats. In principle, the pre-trained DMs can only ensure that purified examples conform to the same distribution of the training data, but it may inadvertently compromise the semantic information of input examples, leading to misclassification of purified examples. Recent advancements introduce guided diffusion techniques to preserve semantic information while removing the perturbations. However, these guidances often rely on distance measures between purified examples and diffused examples, which can also preserve perturbations in purified examples. To further unleash the robustness power of DM-based AP, we propose an adversarial guided diffusion model (AGDM) by introducing a novel adversarial guidance that contains sufficient semantic information but does not explicitly involve adversarial perturbations. The guidance is modeled by an auxiliary neural network obtained with adversarial training, considering the distance in the latent representations rather than at the pixel-level values. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method is effective for simultaneously maintaining semantic information and removing the adversarial perturbations. In addition, comprehensive comparisons show that our method significantly enhances the robustness of existing DM-based AP, with an average robust accuracy improved by up to 7.30% on CIFAR-10.
LGNov 3, 2020
BCGGAN: Ballistocardiogram artifact removal in simultaneous EEG-fMRI using generative adversarial networkGuang Lin, Jianhai Zhang, Yuxi Liu et al.
Due to its advantages of high temporal and spatial resolution, the technology of simultaneous electroencephalogram-functional magnetic resonance imaging (EEG-fMRI) acquisition and analysis has attracted much attention, and has been widely used in various research fields of brain science. However, during the fMRI of the brain, ballistocardiogram (BCG) artifacts can seriously contaminate the EEG. As an unpaired problem, BCG artifact removal now remains a considerable challenge. Aiming to provide a solution, this paper proposed a novel modular generative adversarial network (GAN) and corresponding training strategy to improve the network performance by optimizing the parameters of each module. In this manner, we hope to improve the local representation ability of the network model, thereby improving its overall performance and obtaining a reliable generator for BCG artifact removal. Moreover, the proposed method does not rely on additional reference signal or complex hardware equipment. Experimental results show that, compared with multiple methods, the technique presented in this paper can remove the BCG artifact more effectively while retaining essential EEG information.
CLAug 1, 2019
Simple and Effective Text Matching with Richer Alignment FeaturesRunqi Yang, Jianhai Zhang, Xing Gao et al.
In this paper, we present a fast and strong neural approach for general purpose text matching applications. We explore what is sufficient to build a fast and well-performed text matching model and propose to keep three key features available for inter-sequence alignment: original point-wise features, previous aligned features, and contextual features while simplifying all the remaining components. We conduct experiments on four well-studied benchmark datasets across tasks of natural language inference, paraphrase identification and answer selection. The performance of our model is on par with the state-of-the-art on all datasets with much fewer parameters and the inference speed is at least 6 times faster compared with similarly performed ones.