ITNov 5, 2022Code
Quantization Adaptor for Bit-Level Deep Learning-Based Massive MIMO CSI FeedbackXudong Zhang, Zhilin Lu, Rui Zeng et al.
In massive multiple-input multiple-output (MIMO) systems, the user equipment (UE) needs to feed the channel state information (CSI) back to the base station (BS) for the following beamforming. But the large scale of antennas in massive MIMO systems causes huge feedback overhead. Deep learning (DL) based methods can compress the CSI at the UE and recover it at the BS, which reduces the feedback cost significantly. But the compressed CSI must be quantized into bit streams for transmission. In this paper, we propose an adaptor-assisted quantization strategy for bit-level DL-based CSI feedback. First, we design a network-aided adaptor and an advanced training scheme to adaptively improve the quantization and reconstruction accuracy. Moreover, for easy practical employment, we introduce the expert knowledge of data distribution and propose a pluggable and cost-free adaptor scheme. Experiments show that compared with the state-of-the-art feedback quantization method, this adaptor-aided quantization strategy can achieve better quantization accuracy and reconstruction performance with less or no additional cost. The open-source codes are available at https://github.com/zhang-xd18/QCRNet.
SYMar 10, 2015
Gradient Compared Lp-LMS Algorithms for Sparse System IdentificationYong Feng, Jiasong Wu, Rui Zeng et al.
In this paper, we propose two novel p-norm penalty least mean square (Lp-LMS) algorithms as supplements of the conventional Lp-LMS algorithm established for sparse adaptive filtering recently. A gradient comparator is employed to selectively apply the zero attractor of p-norm constraint for only those taps that have the same polarity as that of the gradient of the squared instantaneous error, which leads to the new proposed gradient compared p-norm constraint LMS algorithm (LpGC-LMS). We explain that the LpGC-LMS can achieve lower mean square error than the standard Lp-LMS algorithm theoretically and experimentally. To further improve the performance of the filter, the LpNGC-LMS algorithm is derived using a new gradient comparator which takes the sign-smoothed version of the previous one. The performance of the LpNGC-LMS is superior to that of the LpGC-LMS in theory and in simulations. Moreover, these two comparators can be easily applied to other norm constraint LMS algorithms to derive some new approaches for sparse adaptive filtering. The numerical simulation results show that the two proposed algorithms achieve better performance than the standard LMS algorithm and Lp-LMS algorithm in terms of convergence rate and steady-state behavior in sparse system identification settings.
ITOct 29, 2022
Better Lightweight Network for Free: Codeword Mimic Learning for Massive MIMO CSI feedbackZhilin Lu, Xudong Zhang, Rui Zeng et al.
The channel state information (CSI) needs to be fed back from the user equipment (UE) to the base station (BS) in frequency division duplexing (FDD) multiple-input multiple-output (MIMO) system. Recently, neural networks are widely applied to CSI compressed feedback since the original overhead is too large for the massive MIMO system. Notably, lightweight feedback networks attract special attention due to their practicality of deployment. However, the feedback accuracy is likely to be harmed by the network compression. In this paper, a cost free distillation technique named codeword mimic (CM) is proposed to train better feedback networks with the practical lightweight encoder. A mimic-explore training strategy with a special distillation scheduler is designed to enhance the CM learning. Experiments show that the proposed CM learning outperforms the previous state-of-the-art feedback distillation method, boosting the performance of the lightweight feedback network without any extra inference cost.
CRSep 2, 2024
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP ModelsRui Zeng, Xi Chen, Yuwen Pu et al.
Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making them considerably stealthier than traditional static backdoor attacks. However, existing research on NLP backdoor detection primarily focuses on defending against static backdoor attacks, while detecting dynamic backdoors in NLP models remains largely unexplored. This paper presents CLIBE, the first framework to detect dynamic backdoors in Transformer-based NLP models. CLIBE injects a "few-shot perturbation" into the suspect Transformer model by crafting optimized weight perturbation in the attention layers to make the perturbed model classify a limited number of reference samples as a target label. Subsequently, CLIBE leverages the generalization ability of this few-shot perturbation to determine whether the original model contains a dynamic backdoor. Extensive evaluation on three advanced NLP dynamic backdoor attacks, two widely-used Transformer frameworks, and four real-world classification tasks strongly validates the effectiveness of CLIBE. We also demonstrate the robustness of CLIBE against various adaptive attacks. Furthermore, we employ CLIBE to scrutinize 49 popular Transformer models on Hugging Face and discover one exhibiting a high probability of containing a dynamic backdoor. We have contacted Hugging Face and provided detailed evidence of this model's backdoor behavior. Moreover, we extend CLIBE to detect backdoor text generation models modified to exhibit toxic behavior. To the best of our knowledge, CLIBE is the first framework capable of detecting backdoors in text generation models without access to trigger input test samples.
SYMar 3, 2015
p Norm Constraint Leaky LMS Algorithm for Sparse System IdentificationYong Feng, Rui Zeng, Jiasong Wu
This paper proposes a new leaky least mean square (leaky LMS, LLMS) algorithm in which a norm penalty is introduced to force the solution to be sparse in the application of system identification. The leaky LMS algorithm is derived because the performance ofthe standard LMS algorithm deteriorates when the input is highly correlated. However, both ofthem do not take the sparsity information into account to yield better behaviors. As a modification ofthe LLMS algorithm, the proposed algorithm, named Lp-LLMS, incorporates a p norm penalty into the cost function ofthe LLMS to obtain a shrinkage in the weight update equation, which then enhances the performance of the filter in system identification settings, especially when the impulse response is sparse. The simulation results verify that the proposed algorithm improves the performance ofthe filter in sparse system settings in the presence ofnoisy input signals.
SYMar 3, 2015
p-norm-like Constraint Leaky LMS Algorithm for Sparse System IdentificationYong Feng, Rui Zeng, Jiasong Wu
In this paper, we propose a novel leaky least mean square (leaky LMS, LLMS) algorithm which employs a p-norm-like constraint to force the solution to be sparse in the application of system identification. As an extension of the LMS algorithm which is the most widely-used adaptive filtering technique, the LLMS algorithm has been proposed for decades, due to the deteriorated performance of the standard LMS algorithm with highly correlated input. However, both ofthem do not consider the sparsity information to have better behaviors. As a sparse-aware modification of the LLMS, our proposed Lplike-LLMS algorithm, incorporates a p-norm-like penalty into the cost function of the LLMS to obtain a shrinkage in the weight update, which then enhances the performance in sparse system identification settings. The simulation results show that the proposed algorithm improves the performance of the filter in sparse system settings in the presence of noisy input signals.
ITFeb 5, 2023
Towards Efficient Subarray Hybrid Beamforming: Attention Network-based Practical Feedback in FDD Massive MU-MIMO SystemsZhilin Lu, Xudong Zhang, Rui Zeng et al.
Channel state information (CSI) feedback is necessary for the frequency division duplexing (FDD) multiple input multiple output (MIMO) systems due to the channel non-reciprocity. With the help of deep learning, many works have succeeded in rebuilding the compressed ideal CSI for massive MIMO. However, simple CSI reconstruction is of limited practicality since the channel estimation and the targeted beamforming design are not considered. In this paper, a jointly optimized network is introduced for channel estimation and feedback so that a spectral-efficient beamformer can be learned. Moreover, the deployment-friendly subarray hybrid beamforming architecture is applied and a practical lightweight end-to-end network is specially designed. Experiments show that the proposed network is over 10 times lighter at the resource-sensitive user equipment compared with the previous state-of-the-art method with only a minor performance loss.
SPJan 2, 2024Code
Enhancing Automatic Modulation Recognition through Robust Global Feature ExtractionYunpeng Qu, Zhilin Lu, Rui Zeng et al.
Automatic Modulation Recognition (AMR) plays a crucial role in wireless communication systems. Deep learning AMR strategies have achieved tremendous success in recent years. Modulated signals exhibit long temporal dependencies, and extracting global features is crucial in identifying modulation schemes. Traditionally, human experts analyze patterns in constellation diagrams to classify modulation schemes. Classical convolutional-based networks, due to their limited receptive fields, excel at extracting local features but struggle to capture global relationships. To address this limitation, we introduce a novel hybrid deep framework named TLDNN, which incorporates the architectures of the transformer and long short-term memory (LSTM). We utilize the self-attention mechanism of the transformer to model the global correlations in signal sequences while employing LSTM to enhance the capture of temporal dependencies. To mitigate the impact like RF fingerprint features and channel characteristics on model generalization, we propose data augmentation strategies known as segment substitution (SS) to enhance the model's robustness to modulation-related features. Experimental results on widely-used datasets demonstrate that our method achieves state-of-the-art performance and exhibits significant advantages in terms of complexity. Our proposed framework serves as a foundational backbone that can be extended to different datasets. We have verified the effectiveness of our augmentation approach in enhancing the generalization of the models, particularly in few-shot scenarios. Code is available at \url{https://github.com/AMR-Master/TLDNN}.
ITFeb 15, 2023
Deep Learning for Hybrid Beamforming with Finite Feedback in GSM Aided mmWave MIMO SystemsZhilin Lu, Xudong Zhang, Rui Zeng et al.
Hybrid beamforming is widely recognized as an important technique for millimeter wave (mmWave) multiple input multiple output (MIMO) systems. Generalized spatial modulation (GSM) is further introduced to improve the spectrum efficiency. However, most of the existing works on beamforming assume the perfect channel state information (CSI), which is unrealistic in practical systems. In this paper, joint optimization of downlink pilot training, channel estimation, CSI feedback, and hybrid beamforming is considered in GSM aided frequency division duplexing (FDD) mmWave MIMO systems. With the help of deep learning, the GSM hybrid beamformers are designed via unsupervised learning in an end-to-end way. Experiments show that the proposed multi-resolution network named GsmEFBNet can reach a better achievable rate with fewer feedback bits compared with the conventional algorithm.
CRDec 24, 2024Code
Detecting and Interpreting NSFW Prompts in Text-to-Image Models through Uncovering Harmful SemanticsYiming Wang, Jiahao Chen, Qingming Li et al.
As text-to-image (T2I) models advance and gain widespread adoption, their associated safety concerns are becoming increasingly critical. Malicious users exploit these models to generate Not-Safe-for-Work (NSFW) images using harmful or adversarial prompts, underscoring the need for effective safeguards to ensure the integrity and compliance of model outputs. However, existing detection methods often exhibit low accuracy and inefficiency. In this paper, we propose HiddenGuard, an interpretable defense framework leveraging the hidden states of T2I models to detect NSFW prompts. HiddenGuard extracts NSFW features from the hidden states of the model's text encoder, utilizing the separable nature of these features to detect NSFW prompts. The detection process is efficient, requiring minimal inference time. HiddenGuard also offers real-time interpretation of results and supports optimization through data augmentation techniques. Our extensive experiments show that HiddenGuard significantly outperforms both commercial and open-source moderation tools, achieving over 95\% accuracy across all datasets and greatly improves computational efficiency.
LGFeb 10
Contextual and Seasonal LSTMs for Time Series Anomaly DetectionLingpei Zhang, Qingming Li, Yong Yang et al.
Univariate time series (UTS), where each timestamp records a single variable, serve as crucial indicators in web systems and cloud servers. Anomaly detection in UTS plays an essential role in both data mining and system reliability management. However, existing reconstruction-based and prediction-based methods struggle to capture certain subtle anomalies, particularly small point anomalies and slowly rising anomalies. To address these challenges, we propose a novel prediction-based framework named Contextual and Seasonal LSTMs (CS-LSTMs). CS-LSTMs are built upon a noise decomposition strategy and jointly leverage contextual dependencies and seasonal patterns, thereby strengthening the detection of subtle anomalies. By integrating both time-domain and frequency-domain representations, CS-LSTMs achieve more accurate modeling of periodic trends and anomaly localization. Extensive evaluations on public benchmark datasets demonstrate that CS-LSTMs consistently outperform state-of-the-art methods, highlighting their effectiveness and practical value in robust time series anomaly detection.
AINov 14, 2024
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based AgentsYuyou Gan, Yong Yang, Zhe Ma et al.
With the continuous development of large language models (LLMs), transformer-based models have made groundbreaking advances in numerous natural language processing (NLP) tasks, leading to the emergence of a series of agents that use LLMs as their control hub. While LLMs have achieved success in various tasks, they face numerous security and privacy threats, which become even more severe in the agent scenarios. To enhance the reliability of LLM-based applications, a range of research has emerged to assess and mitigate these risks from different perspectives. To help researchers gain a comprehensive understanding of various risks, this survey collects and analyzes the different threats faced by these agents. To address the challenges posed by previous taxonomies in handling cross-module and cross-stage threats, we propose a novel taxonomy framework based on the sources and impacts. Additionally, we identify six key features of LLM-based agents, based on which we summarize the current research progress and analyze their limitations. Subsequently, we select four representative agents as case studies to analyze the risks they may face in practical use. Finally, based on the aforementioned analyses, we propose future research directions from the perspectives of data, methodology, and policy, respectively.
SPJun 16, 2020
GCNs-Net: A Graph Convolutional Neural Network Approach for Decoding Time-resolved EEG Motor Imagery SignalsYimin Hou, Shuyue Jia, Xiangmin Lun et al.
Towards developing effective and efficient brain-computer interface (BCI) systems, precise decoding of brain activity measured by electroencephalogram (EEG), is highly demanded. Traditional works classify EEG signals without considering the topological relationship among electrodes. However, neuroscience research has increasingly emphasized network patterns of brain dynamics. Thus, the Euclidean structure of electrodes might not adequately reflect the interaction between signals. To fill the gap, a novel deep learning framework based on the graph convolutional neural networks (GCNs) is presented to enhance the decoding performance of raw EEG signals during different types of motor imagery (MI) tasks while cooperating with the functional topological relationship of electrodes. Based on the absolute Pearson's matrix of overall signals, the graph Laplacian of EEG electrodes is built up. The GCNs-Net constructed by graph convolutional layers learns the generalized features. The followed pooling layers reduce dimensionality, and the fully-connected softmax layer derives the final prediction. The introduced approach has been shown to converge for both personalized and group-wise predictions. It has achieved the highest averaged accuracy, 93.06% and 88.57% (PhysioNet Dataset), 96.24% and 80.89% (High Gamma Dataset), at the subject and group level, respectively, compared with existing studies, which suggests adaptability and robustness to individual variability. Moreover, the performance is stably reproducible among repetitive experiments for cross-validation. The excellent performance of our method has shown that it is an important step towards better BCI approaches. To conclude, the GCNs-Net filters EEG signals based on the functional topological relationship, which manages to decode relevant features for brain motor imagery.
CVApr 28, 2020
Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion RecognitionDung Nguyen, Duc Thanh Nguyen, Rui Zeng et al.
Multimodal dimensional emotion recognition has drawn a great attention from the affective computing community and numerous schemes have been extensively investigated, making a significant progress in this area. However, several questions still remain unanswered for most of existing approaches including: (i) how to simultaneously learn compact yet representative features from multimodal data, (ii) how to effectively capture complementary features from multimodal streams, and (iii) how to perform all the tasks in an end-to-end manner. To address these challenges, in this paper, we propose a novel deep neural network architecture consisting of a two-stream auto-encoder and a long short term memory for effectively integrating visual and audio signal streams for emotion recognition. To validate the robustness of our proposed architecture, we carry out extensive experiments on the multimodal emotion in the wild dataset: RECOLA. Experimental results show that the proposed method achieves state-of-the-art recognition performance and surpasses existing schemes by a significant margin.
CVMar 24, 2020
Joint Deep Cross-Domain Transfer Learning for Emotion RecognitionDung Nguyen, Sridha Sridharan, Duc Thanh Nguyen et al.
Deep learning has been applied to achieve significant progress in emotion recognition. Despite such substantial progress, existing approaches are still hindered by insufficient training data, and the resulting models do not generalize well under mismatched conditions. To address this challenge, we propose a learning strategy which jointly transfers the knowledge learned from rich datasets to source-poor datasets. Our method is also able to learn cross-domain features which lead to improved recognition performance. To demonstrate the robustness of our proposed framework, we conducted experiments on three benchmark emotion datasets including eNTERFACE, SAVEE, and EMODB. Experimental results show that the proposed method surpassed state-of-the-art transfer learning schemes by a significant margin.
CVDec 16, 2019
MTRNet++: One-stage Mask-based Scene Text EraserOsman Tursun, Simon Denman, Rui Zeng et al.
A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability.
IVOct 11, 2019
Adversarial Pulmonary Pathology Translation for Pairwise Chest X-ray Data AugmentationYunyan Xing, Zongyuan Ge, Rui Zeng et al.
Recent works show that Generative Adversarial Networks (GANs) can be successfully applied to chest X-ray data augmentation for lung disease recognition. However, the implausible and distorted pathology features generated from the less than perfect generator may lead to wrong clinical decisions. Why not keep the original pathology region? We proposed a novel approach that allows our generative model to generate high quality plausible images that contain undistorted pathology areas. The main idea is to design a training scheme based on an image-to-image translation network to introduce variations of new lung features around the pathology ground-truth area. Moreover, our model is able to leverage both annotated disease images and unannotated healthy lung images for the purpose of generation. We demonstrate the effectiveness of our model on two tasks: (i) we invite certified radiologists to assess the quality of the generated synthetic images against real and other state-of-the-art generative models, and (ii) data augmentation to improve the performance of disease localisation.
CVMar 19, 2019
Geometry-constrained Car Recognition Using a 3D Perspective NetworkRui Zeng, Zongyuan Ge, Simon Denman et al.
We present a novel learning framework for vehicle recognition from a single RGB image. Unlike existing methods which only use attention mechanisms to locate 2D discriminative information, our work learns a novel 3D perspective feature representation of a vehicle, which is then fused with 2D appearance feature to predict the category. The framework is composed of a global network (GN), a 3D perspective network (3DPN), and a fusion network. The GN is used to locate the region of interest (RoI) and generate the 2D global feature. With the assistance of the RoI, the 3DPN estimates the 3D bounding box under the guidance of the proposed vanishing point loss, which provides a perspective geometry constraint. Then the proposed 3D representation is generated by eliminating the viewpoint variance of the 3D bounding box using perspective transformation. Finally, the 3D and 2D feature are fused to predict the category of the vehicle. We present qualitative and quantitative results on the vehicle classification and verification tasks in the BoxCars dataset. The results demonstrate that, by learning such a concise 3D representation, we can achieve superior performance to methods that only use 2D information while retain 3D meaningful information without the challenge of requiring a 3D CAD model.
CVMar 11, 2019
MTRNet: A Generic Scene Text EraserOsman Tursun, Rui Zeng, Simon Denman et al.
Text removal algorithms have been proposed for uni-lingual scripts with regular shapes and layouts. However, to the best of our knowledge, a generic text removal method which is able to remove all or user-specified text regions regardless of font, script, language or shape is not available. Developing such a generic text eraser for real scenes is a challenging task, since it inherits all the challenges of multi-lingual and curved text detection and inpainting. To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet is a conditional adversarial generative network (cGAN) with an auxiliary mask. The introduced auxiliary mask not only makes the cGAN a generic text eraser, but also enables stable training and early convergence on a challenging large-scale synthetic dataset, initially proposed for text detection in real scenes. What's more, MTRNet achieves state-of-the-art results on several real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without being explicitly trained on this data, outperforming previous state-of-the-art methods trained directly on these datasets.
LGDec 20, 2015
Kernel principal component analysis network for image classificationDan Wu, Jiasong Wu, Rui Zeng et al.
In order to classify the nonlinear feature with linear classifier and improve the classification accuracy, a deep learning network named kernel principal component analysis network (KPCANet) is proposed. First, mapping the data into higher space with kernel principal component analysis to make the data linearly separable. Then building a two-layer KPCANet to obtain the principal components of image. Finally, classifying the principal components with linearly classifier. Experimental results show that the proposed KPCANet is effective in face recognition, object recognition and hand-writing digits recognition, it also outperforms principal component analysis network (PCANet) generally as well. Besides, KPCANet is invariant to illumination and stable to occlusion and slight deformation.
SYSep 26, 2015
Error Gradient-based Variable-Lp Norm Constraint LMS Algorithm for Sparse System IdentificationYong Feng, Fei Chen, Rui Zeng et al.
Sparse adaptive filtering has gained much attention due to its wide applicability in the field of signal processing. Among the main algorithm families, sparse norm constraint adaptive filters develop rapidly in recent years. However, when applied for system identification, most priori work in sparse norm constraint adaptive filtering suffers from the difficulty of adaptability to the sparsity of the systems to be identified. To address this problem, we propose a novel variable p-norm constraint least mean square (LMS) algorithm, which serves as a variant of the conventional Lp-LMS algorithm established for sparse system identification. The parameter p is iteratively adjusted by the gradient descent method applied to the instantaneous square error. Numerical simulations show that this new approach achieves better performance than the traditional Lp-LMS and LMS algorithms in terms of steady-state error and convergence rate.
CVMar 5, 2015
Color Image Classification via Quaternion Principal Component Analysis NetworkRui Zeng, Jiasong Wu, Zhuhong Shao et al.
The Principal Component Analysis Network (PCANet), which is one of the recently proposed deep learning architectures, achieves the state-of-the-art classification accuracy in various databases. However, the performance of PCANet may be degraded when dealing with color images. In this paper, a Quaternion Principal Component Analysis Network (QPCANet), which is an extension of PCANet, is proposed for color images classification. Compared to PCANet, the proposed QPCANet takes into account the spatial distribution information of color images and ensures larger amount of intra-class invariance of color images. Experiments conducted on different color image datasets such as Caltech-101, UC Merced Land Use, Georgia Tech face and CURet have revealed that the proposed QPCANet achieves higher classification accuracy than PCANet.
CVNov 5, 2014
Tensor object classification via multilinear discriminant analysis networkRui Zeng, Jiasong Wu, Lotfi Senhadji et al.
This paper proposes a multilinear discriminant analysis network (MLDANet) for the recognition of multidimensional objects, known as tensor objects. The MLDANet is a variation of linear discriminant analysis network (LDANet) and principal component analysis network (PCANet), both of which are the recently proposed deep learning algorithms. The MLDANet consists of three parts: 1) The encoder learned by MLDA from tensor data. 2) Features maps ob-tained from decoder. 3) The use of binary hashing and histogram for feature pooling. A learning algorithm for MLDANet is described. Evaluations on UCF11 database indicate that the proposed MLDANet outperforms the PCANet, LDANet, MPCA + LDA, and MLDA in terms of classification for tensor objects.
CVNov 5, 2014
Multilinear Principal Component Analysis Network for Tensor Object ClassificationRui Zeng, Jiasong Wu, Zhuhong Shao et al.
The recently proposed principal component analysis network (PCANet) has been proved high performance for visual content classification. In this letter, we develop a tensorial extension of PCANet, namely, multilinear principal analysis component network (MPCANet), for tensor object classification. Compared to PCANet, the proposed MPCANet uses the spatial structure and the relationship between each dimension of tensor objects much more efficiently. Experiments were conducted on different visual content datasets including UCF sports action video sequences database and UCF11 database. The experimental results have revealed that the proposed MPCANet achieves higher classification accuracy than PCANet for tensor object classification.