8.7CLOct 14, 2022
Watermarking Pre-trained Language Models with BackdooringChenxi Gu, Chengsong Huang, Xiaoqing Zheng et al.
Large pre-trained language models (PLMs) have proven to be a crucial component of modern natural language processing systems. PLMs typically need to be fine-tuned on task-specific downstream datasets, which makes it hard to claim the ownership of PLMs and protect the developer's intellectual property due to the catastrophic forgetting phenomenon. We show that PLMs can be watermarked with a multi-task learning framework by embedding backdoors triggered by specific inputs defined by the owners, and those watermarks are hard to remove even though the watermarked PLMs are fine-tuned on multiple downstream tasks. In addition to using some rare words as triggers, we also show that the combination of common words can be used as backdoor triggers to avoid them being easily detected. Extensive experiments on multiple datasets demonstrate that the embedded watermarks can be robustly extracted with a high success rate and less influenced by the follow-up fine-tuning.
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query RefinementXiaohua Wang, Zisu Huang, Feiran Zhang et al.
The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially toxic content. To enhance the capabilities of LLMs while maintaining strong robustness against harmful jailbreak inputs, this study proposes a transferable and pluggable framework that refines user prompts before they are input into LLMs. This strategy improves the quality of the queries, empowering LLMs to generate more truthful, benign and useful responses. Specifically, a lightweight query refinement model is introduced and trained using a specially designed reinforcement learning approach that incorporates multiple objectives to enhance particular capabilities of LLMs. Extensive experiments demonstrate that the refinement model not only improves the quality of responses but also strengthens their robustness against jailbreak attacks. Code is available at: https://github.com/Huangzisu/query-refinement .
Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature SpaceYufei Gu, Xiaoqing Zheng, Tomaso Aste
Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon in specific contexts, an accepted theory to account for its occurrence in deep learning remains yet to be established. In this study, we revisit the phenomenon of double descent and demonstrate that its occurrence is strongly influenced by the presence of noisy data. Through conducting a comprehensive analysis of the feature space of learned representations, we unveil that double descent arises in imperfect models trained with noisy data. We argue that double descent is a consequence of the model first learning the noisy data until interpolation and then adding implicit regularization via over-parameterization acquiring therefore capability to separate the information from the noise.
Improving Continual Pre-training Through Seamless Data PackingRuicheng Yin, Xuan Gao, Changze Lv et al.
Continual pre-training has demonstrated significant potential in enhancing model performance, particularly in domain-specific scenarios. The most common approach for packing data before continual pre-training involves concatenating input texts and splitting them into fixed-length sequences. While straightforward and efficient, this method often leads to excessive truncation and context discontinuity, which can hinder model performance. To address these issues, we explore the potential of data engineering to enhance continual pre-training, particularly its impact on model performance and efficiency. We propose Seamless Packing (SP), a novel data packing strategy aimed at preserving contextual information more effectively and enhancing model performance. Our approach employs a sliding window technique in the first stage that synchronizes overlapping tokens across consecutive sequences, ensuring better continuity and contextual coherence. In the second stage, we adopt a First-Fit-Decreasing algorithm to pack shorter texts into bins slightly larger than the target sequence length, thereby minimizing padding and truncation. Empirical evaluations across various model architectures and corpus domains demonstrate the effectiveness of our method, outperforming baseline method in 99% of all settings. Code is available at https://github.com/Infernus-WIND/Seamless-Packing.
Improving Empathetic Dialogue Generation by Dynamically Infusing Commonsense KnowledgeHua Cai, Xuli Shen, Qing Xu et al.
In empathetic conversations, individuals express their empathy towards others. Previous work has mainly focused on generating empathetic responses by utilizing the speaker's emotion. Besides, external commonsense knowledge has been applied to enhance the system's understandings of the speaker's situation. However, given an event, commonsense knowledge base contains various relations, potentially leading to confusion for the dialogue system. Consequently, inconsistencies arise among the emotion, generated response and speaker's contextual information. To this end, we propose a novel approach for empathetic response generation, which incorporates an adaptive module for commonsense knowledge selection to ensure consistency between the generated empathetic responses and the speaker's situation. This selected knowledge is used to refine the commonsense cognition and empathy expression for generated responses. Experimental results show that our approach significantly outperforms baseline models in both automatic and human evaluations, exhibiting the generation of more coherent and empathetic responses. Moreover, case studies highlight the interpretability of knowledge selection in the responses and the effectiveness of adaptive module in our model. Code: https://github.com/Hanscal/DCKS.
16.3CLJun 4, 2025
Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical ReasoningMuling Wu, Qi Qian, Wenhao Liu et al.
Large Language Models (LLMs) have achieved remarkable performance across various reasoning tasks, yet post-training is constrained by inefficient sample utilization and inflexible difficulty samples processing. To address these limitations, we propose Customized Curriculum Learning (CCL), a novel framework with two key innovations. First, we introduce model-adaptive difficulty definition that customizes curriculum datasets based on each model's individual capabilities rather than using predefined difficulty metrics. Second, we develop "Guided Prompting," which dynamically reduces sample difficulty through strategic hints, enabling effective utilization of challenging samples that would otherwise degrade performance. Comprehensive experiments on supervised fine-tuning and reinforcement learning demonstrate that CCL significantly outperforms uniform training approaches across five mathematical reasoning benchmarks, confirming its effectiveness across both paradigms in enhancing sample utilization and model performance.
4.9CLMar 6, 2025
Layer-Specific Scaling of Positional Encodings for Superior Long-Context ModelingZhenghua Wang, Yiran Ding, Changze Lv et al.
Although large language models (LLMs) have achieved significant progress in handling long-context inputs, they still suffer from the ``lost-in-the-middle'' problem, where crucial information in the middle of the context is often underrepresented or lost. Our extensive experiments reveal that this issue may arise from the rapid long-term decay in Rotary Position Embedding (RoPE). To address this problem, we propose a layer-specific positional encoding scaling method that assigns distinct scaling factors to each layer, slowing down the decay rate caused by RoPE to make the model pay more attention to the middle context. A specially designed genetic algorithm is employed to efficiently select the optimal scaling factors for each layer by incorporating Bezier curves to reduce the search space. Through comprehensive experimentation, we demonstrate that our method significantly alleviates the ``lost-in-the-middle'' problem. Our approach results in an average accuracy improvement of up to 20% on the Key-Value Retrieval dataset. Furthermore, we show that layer-specific interpolation, as opposed to uniform interpolation across all layers, enhances the model's extrapolation capabilities when combined with PI and Dynamic-NTK positional encoding schemes.
Certified Robustness to Text Adversarial Attacks by Randomized [MASK]Jiehang Zeng, Xiaoqing Zheng, Jianhan Xu et al.
Recently, few certified defense methods have been developed to provably guarantee the robustness of a text classifier to adversarial synonym substitutions. However, all existing certified defense methods assume that the defenders are informed of how the adversaries generate synonyms, which is not a realistic scenario. In this paper, we propose a certifiably robust defense method by randomly masking a certain proportion of the words in an input text, in which the above unrealistic assumption is no longer necessary. The proposed method can defend against not only word substitution-based attacks, but also character-level perturbations. We can certify the classifications of over 50% texts to be robust to any perturbation of 5 words on AGNEWS, and 2 words on SST2 dataset. The experimental results show that our randomized smoothing method significantly outperforms recently proposed defense methods across multiple datasets.
1.6CLMar 2, 2021
Unsupervised Word Segmentation with Bi-directional Neural Language ModelLihao Wang, Zongyi Li, Xiaoqing Zheng
We present an unsupervised word segmentation model, in which the learning objective is to maximize the generation probability of a sentence given its all possible segmentation. Such generation probability can be factorized into the likelihood of each possible segment given the context in a recursive way. In order to better capture the long- and short-term dependencies, we propose to use bi-directional neural language models to better capture the features of segment's context. Two decoding algorithms are also described to combine the context features from both directions to generate the final segmentation, which helps to reconcile word boundary ambiguities. Experimental results showed that our context-sensitive unsupervised segmentation model achieved state-of-the-art at different evaluation settings on various data sets for Chinese, and the comparable result for Thai.