LGAug 19, 2023
Generative Adversarial Networks UnlearningHui Sun, Tianqing Zhu, Wenhan Chang et al.
As machine learning continues to develop, and data misuse scandals become more prevalent, individuals are becoming increasingly concerned about their personal information and are advocating for the right to remove their data. Machine unlearning has emerged as a solution to erase training data from trained machine learning models. Despite its success in classifiers, research on Generative Adversarial Networks (GANs) is limited due to their unique architecture, including a generator and a discriminator. One challenge pertains to generator unlearning, as the process could potentially disrupt the continuity and completeness of the latent space. This disruption might consequently diminish the model's effectiveness after unlearning. Another challenge is how to define a criterion that the discriminator should perform for the unlearning images. In this paper, we introduce a substitution mechanism and define a fake label to effectively mitigate these challenges. Based on the substitution mechanism and fake label, we propose a cascaded unlearning approach for both item and class unlearning within GAN models, in which the unlearning and learning processes run in a cascaded manner. We conducted a comprehensive evaluation of the cascaded unlearning technique using the MNIST and CIFAR-10 datasets. Experimental results demonstrate that this approach achieves significantly improved item and class unlearning efficiency, reducing the required time by up to 185x and 284x for the MNIST and CIFAR-10 datasets, respectively, in comparison to retraining from scratch. Notably, although the model's performance experiences minor degradation after unlearning, this reduction is negligible when dealing with a minimal number of images (e.g., 64) and has no adverse effects on downstream tasks such as classification.
93.7CRApr 10Code
Unreal Thinking: Chain-of-Thought Hijacking via Two-stage BackdoorWenhan Chang, Tianqing Zhu, Ping Xiong et al.
Large Language Models (LLMs) are increasingly deployed in settings where Chain-of-Thought (CoT) is interpreted by users. This creates a new safety risk: attackers may manipulate the model's observable CoT to make malicious behaviors. In open-weight ecosystems, such manipulation can be embedded in lightweight adapters that are easy to distribute and attach to base models. In practice, persistent CoT hijacking faces three main challenges: the difficulty of directly hijacking CoT tokens within one continuous long CoT-output sequence while maintaining stable downstream outputs, the scarcity of malicious CoT data, and the instability of naive backdoor injection methods. To address the data scarcity issue, we propose Multiple Reverse Tree Search (MRTS), a reverse synthesis procedure that constructs output-aligned CoTs from prompt-output pairs without directly eliciting malicious CoTs from aligned models. Building on MRTS, we introduce Two-stage Backdoor Hijacking (TSBH), which first induces a trigger-conditioned mismatch between intermediate CoT and malicious outputs, and then fine-tunes the model on MRTS-generated CoTs that have lower embedding distance to the malicious outputs, thereby ensuring stronger semantic similarity. Experiments across multiple open-weight models demonstrate that our method successfully induces trigger-activated CoT hijacking while maintaining a quantifiable distinction between hijacked and baseline states under our evaluation framework. We further explore a reasoning-based mitigation approach and release a safety-reasoning dataset to support future research on safety-aware and reliable reasoning. Our code is available at https://github.com/ChangWenhan/TSBH_official.
CLJun 20, 2025Code
From Thinking to Output: Chain-of-Thought and Text Generation Characteristics in Reasoning Language ModelsJunhao Liu, Zhenhao Xu, Yuxin Fang et al.
Recently, there have been notable advancements in large language models (LLMs), demonstrating their growing abilities in complex reasoning. However, existing research largely overlooks a thorough and systematic comparison of these models' reasoning processes and outputs, particularly regarding their self-reflection pattern (also termed "Aha moment") and the interconnections across diverse domains. This paper proposes a novel framework for analyzing the reasoning characteristics of four cutting-edge large reasoning models (GPT-o1, DeepSeek-R1, Kimi-k1.5, and Grok-3) using keywords statistic and LLM-as-a-judge paradigm. Our approach connects their internal thinking processes with their final outputs. A diverse dataset consists of real-world scenario-based questions covering logical deduction, causal inference, and multi-step problem-solving. Additionally, a set of metrics is put forward to assess both the coherence of reasoning and the accuracy of the outputs. The research results uncover various patterns of how these models balance exploration and exploitation, deal with problems, and reach conclusions during the reasoning process. Through quantitative and qualitative comparisons, disparities among these models are identified in aspects such as the depth of reasoning, the reliance on intermediate steps, and the degree of similarity between their thinking processes and output patterns and those of GPT-o1. This work offers valuable insights into the trade-off between computational efficiency and reasoning robustness and provides practical recommendations for enhancing model design and evaluation in practical applications. We publicly release our project at: https://github.com/ChangWenhan/FromThinking2Output
80.4CRMay 12
Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic AnalysisZhenhao Xu, Wenhan Chang, Yichuan Chen et al.
Large Reasoning Models (LRMs) improve performance on complex tasks, but they also make safety control harder at deployment time. In black-box settings, defenders cannot modify model weights and must instead intervene at inference time. This setting creates three practical challenges: harmful intent may be hidden by educational or role-play framing, deep safety analysis can introduce non-trivial latency, and long adversarial contexts can dilute the local cues that simpler filters rely on. These challenges can expose an apparent thinking--output gap, where the model appears cautious during reasoning but still produces an unsafe final answer. To address this problem, we propose Safety Context Injection (SCI), an inference-time framework that separates safety assessment from task generation and prepends a structured external risk report as injected safety context for the protected model. The framework is instantiated in two complementary variants: Static Model Filtering (SMF), a lightweight one-pass guard for fast deployment, and Dynamic Agents Filtering (DAF), an agentic-loop-based analyzer that iteratively gathers and synthesizes evidence for ambiguous or long-context attacks. Across AdvBench and GPTFuzz, spanning base and reasoning models under five jailbreak families, both variants reduce attack success rate and toxicity in the evaluated settings. SMF offers an efficient low-latency option, while DAF is more effective when harmful intent is semantically disguised or dispersed across long contexts.
LGMay 24, 2024
Class Machine Unlearning for Complex Data via Concepts Inference and Data PoisoningWenhan Chang, Tianqing Zhu, Heng Xu et al.
In current AI era, users may request AI companies to delete their data from the training dataset due to the privacy concerns. As a model owner, retraining a model will consume significant computational resources. Therefore, machine unlearning is a new emerged technology to allow model owner to delete requested training data or a class with little affecting on the model performance. However, for large-scaling complex data, such as image or text data, unlearning a class from a model leads to a inferior performance due to the difficulty to identify the link between classes and model. An inaccurate class deleting may lead to over or under unlearning. In this paper, to accurately defining the unlearning class of complex data, we apply the definition of Concept, rather than an image feature or a token of text data, to represent the semantic information of unlearning class. This new representation can cut the link between the model and the class, leading to a complete erasing of the impact of a class. To analyze the impact of the concept of complex data, we adopt a Post-hoc Concept Bottleneck Model, and Integrated Gradients to precisely identify concepts across different classes. Next, we take advantage of data poisoning with random and targeted labels to propose unlearning methods. We test our methods on both image classification models and large language models (LLMs). The results consistently show that the proposed methods can accurately erase targeted information from models and can largely maintain the performance of the models.
LGOct 31, 2024
Zero-shot Class Unlearning via Layer-wise Relevance Analysis and Neuronal Path PerturbationWenhan Chang, Tianqing Zhu, Ping Xiong et al.
In the rapid advancement of artificial intelligence, privacy protection has become crucial, giving rise to machine unlearning. Machine unlearning is a technique that removes specific data influences from trained models without the need for extensive retraining. However, it faces several key challenges, including accurately implementing unlearning, ensuring privacy protection during the unlearning process, and achieving effective unlearning without significantly compromising model performance. This paper presents a novel approach to machine unlearning by employing Layer-wise Relevance Analysis and Neuronal Path Perturbation. We address three primary challenges: the lack of detailed unlearning principles, privacy guarantees in zero-shot unlearning scenario, and the balance between unlearning effectiveness and model utility. Our method balances machine unlearning performance and model utility by identifying and perturbing highly relevant neurons, thereby achieving effective unlearning. By using data not present in the original training set during the unlearning process, we satisfy the zero-shot unlearning scenario and ensure robust privacy protection. Experimental results demonstrate that our approach effectively removes targeted data from the target unlearning model while maintaining the model's utility, offering a practical solution for privacy-preserving machine learning.
CLAug 27, 2025
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial TablesJie Zhang, Changzai Pan, Kaiwen Wei et al.
Extensive research has been conducted to explore the capabilities of large language models (LLMs) in table reasoning. However, the essential task of transforming tables information into reports remains a significant challenge for industrial applications. This task is plagued by two critical issues: 1) the complexity and diversity of tables lead to suboptimal reasoning outcomes; and 2) existing table benchmarks lack the capacity to adequately assess the practical application of this task. To fill this gap, we propose the table-to-report task and construct a bilingual benchmark named T2R-bench, where the key information flow from the tables to the reports for this task. The benchmark comprises 457 industrial tables, all derived from real-world scenarios and encompassing 19 industry domains as well as 4 types of industrial tables. Furthermore, we propose an evaluation criteria to fairly measure the quality of report generation. The experiments on 25 widely-used LLMs reveal that even state-of-the-art models like Deepseek-R1 only achieves performance with 62.71 overall score, indicating that LLMs still have room for improvement on T2R-bench.
CRDec 8, 2024
Large Language Models Merging for Enhancing the Link Stealing Attack on Graph Neural NetworksFaqian Guan, Tianqing Zhu, Wenhan Chang et al.
Graph Neural Networks (GNNs), specifically designed to process the graph data, have achieved remarkable success in various applications. Link stealing attacks on graph data pose a significant privacy threat, as attackers aim to extract sensitive relationships between nodes (entities), potentially leading to academic misconduct, fraudulent transactions, or other malicious activities. Previous studies have primarily focused on single datasets and did not explore cross-dataset attacks, let alone attacks that leverage the combined knowledge of multiple attackers. However, we find that an attacker can combine the data knowledge of multiple attackers to create a more effective attack model, which can be referred to cross-dataset attacks. Moreover, if knowledge can be extracted with the help of Large Language Models (LLMs), the attack capability will be more significant. In this paper, we propose a novel link stealing attack method that takes advantage of cross-dataset and Large Language Models (LLMs). The LLM is applied to process datasets with different data structures in cross-dataset attacks. Each attacker fine-tunes the LLM on their specific dataset to generate a tailored attack model. We then introduce a novel model merging method to integrate the parameters of these attacker-specific models effectively. The result is a merged attack model with superior generalization capabilities, enabling effective attacks not only on the attackers' datasets but also on previously unseen (out-of-domain) datasets. We conducted extensive experiments in four datasets to demonstrate the effectiveness of our method. Additional experiments with three different GNN and LLM architectures further illustrate the generality of our approach.
AIJul 10, 2025
TableReasoner: Advancing Table Reasoning Framework with Large Language ModelsSishi Xiong, Dakai Wang, Yu Zhao et al.
The paper presents our system developed for table question answering (TQA). TQA tasks face challenges due to the characteristics of real-world tabular data, such as large size, incomplete column semantics, and entity ambiguity. To address these issues, we propose a large language model (LLM)-powered and programming-based table reasoning framework, named TableReasoner. It models a table using the schema that combines structural and semantic representations, enabling holistic understanding and efficient processing of large tables. We design a multi-step schema linking plan to derive a focused table schema that retains only query-relevant information, eliminating ambiguity and alleviating hallucinations. This focused table schema provides precise and sufficient table details for query refinement and programming. Furthermore, we integrate the reasoning workflow into an iterative thinking architecture, allowing incremental cycles of thinking, reasoning and reflection. Our system achieves first place in both subtasks of SemEval-2025 Task 8.
CRMay 23, 2025
Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic NarrativesWenhan Chang, Tianqing Zhu, Yu Zhao et al.
In the era of rapid generative AI development, interactions with large language models (LLMs) pose increasing risks of misuse. Prior research has primarily focused on attacks using template-based prompts and optimization-oriented methods, while overlooking the fact that LLMs possess strong unconstrained deceptive capabilities to attack other LLMs. This paper introduces a novel jailbreaking method inspired by the Chain-of-Thought mechanism. The attacker employs mission transfer to conceal harmful user intent within dialogue and generates a progressive chain of lure questions without relying on predefined templates, enabling successful jailbreaks. To further improve the attack's strength, we incorporate a helper LLM model that performs randomized narrative optimization over multi-turn interactions, enhancing the attack performance while preserving alignment with the original intent. We also propose a toxicity-based framework using third-party LLMs to evaluate harmful content and its alignment with malicious intent. Extensive experiments demonstrate that our method consistently achieves high attack success rates and elevated toxicity scores across diverse types of LLMs under black-box API settings. These findings reveal the intrinsic potential of LLMs to perform unrestricted attacks in the absence of robust alignment constraints. Our approach offers data-driven insights to inform the design of future alignment mechanisms. Finally, we propose two concrete defense strategies to support the development of safer generative models.