LGJul 26, 2024
NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic SlicingShide Zhou, Tianlin Li, Yihao Huang et al.
Deep Neural networks (DNNs), extensively applied across diverse disciplines, are characterized by their integrated and monolithic architectures, setting them apart from conventional software systems. This architectural difference introduces particular challenges to maintenance tasks, such as model restructure (e.g., model compression), re-adaptation (e.g., fitting new samples), and incremental development (e.g., continual knowledge accumulation). Prior research addresses these challenges by identifying task-critical neuron layers, and dividing neural networks into semantically-similar sequential modules. However, such layer-level approaches fail to precisely identify and manipulate neuron-level semantic components, restricting their applicability to finer-grained model maintenance tasks. In this work, we implement NeuSemSlice, a novel framework that introduces the semantic slicing technique to effectively identify critical neuron-level semantic components in DNN models for semantic-aware model maintenance tasks. Specifically, semantic slicing identifies, categorizes and merges critical neurons across different categories and layers according to their semantic similarity, enabling their flexibility and effectiveness in the subsequent tasks. For semantic-aware model maintenance tasks, we provide a series of novel strategies based on semantic slicing to enhance NeuSemSlice. They include semantic components (i.e., critical neurons) preservation for model restructure, critical neuron tuning for model re-adaptation, and non-critical neuron training for model incremental development. A thorough evaluation has demonstrated that NeuSemSlice significantly outperforms baselines in all three tasks.
88.2CRApr 5
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State ForensicsShide Zhou, Kailong Wang, Ling Shi et al.
The widespread adoption of Large Language Models (LLMs) in critical applications has introduced severe reliability and security risks, as LLMs remain vulnerable to notorious threats such as hallucinations, jailbreak attacks, and backdoor exploits. These vulnerabilities have been weaponized by malicious actors, leading to unauthorized access, widespread misinformation, and compromised LLM-embedded system integrity. In this work, we introduce a novel approach to detecting abnormal behaviors in LLMs via hidden state forensics. By systematically inspecting layer-specific activation patterns, we develop a general framework that can efficiently identify a range of security threats in real-time without imposing prohibitive computational costs. Extensive experiments indicate detection accuracies exceeding 95% and consistently robust performance across multiple models in most scenarios, while preserving the ability to detect novel attacks effectively. Furthermore, the computational overhead remains minimal, with detector inference taking merely fractions of a second. The significance of this work lies in proposing a promising strategy to reinforce the security of LLM-integrated systems, paving the way for safer and more reliable deployment in high-stakes domains. By enabling real-time detection that can also support the mitigation of abnormal behaviors, it represents a meaningful step toward ensuring the trustworthiness of AI systems amid rising security challenges.
IRJul 26, 2025Code
OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation ToolsJiaxuan Liang, Shide Zhou, Kailong Wang
While Retrieval Augmented Generation (RAG) is now widely adopted to enhance LLMs, evaluating its true performance benefits in a reproducible and interpretable way remains a major hurdle. Existing methods often fall short: they lack domain coverage, employ coarse metrics that miss sub document precision, and fail to capture computational trade offs. Most critically, they provide no standardized framework for comparing RAG effectiveness across different models and domains. We introduce OmniBench RAG, a novel automated platform for multi domain evaluation of RAG systems. The platform quantifies performance gains across accuracy and efficiency dimensions, spanning nine knowledge fields including culture, geography, and health. We introduce two standardized metrics: Improvements (accuracy gains) and Transformation (efficiency differences between pre RAG and post RAG models), enabling reproducible comparisons across models and tasks. The platform features dynamic test generation, modular evaluation pipelines, and automated knowledge base construction. Our evaluation reveals striking variability in RAG effectiveness, from significant gains in culture to declines in mathematics, highlighting the critical importance of systematic, domain aware assessment. A demonstration video is available at: https://www.youtube.com/watch?v=BZx83QFcTCI. Code and datasets: https://github.com/Garnett-Liang/Omnibench-RAG.
89.8CRApr 1
When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM FusionJiaqing Li, Zhibo Zhang, Shide Zhou et al.
Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training costs. However, the security implications of this widely-adopted practice remain critically underexplored. In this work, we reveal that model merging introduces a novel attack surface that can be systematically exploited to compromise safety alignment. We present TrojanMerge,, a framework that embeds latent malicious components into source models that remain individually benign but produce severely misaligned models when merged. Our key insight is formulating this attack as a constrained optimization problem: we construct perturbations that preserve source model safety through directional consistency constraints, maintain capabilities via Frobenius directional alignment constraints, yet combine during merging to form pre-computed attack vectors. Extensive experiments across 9 LLMs from 3 model families demonstrate that TrojanMerge, consistently achieves high harmful response rates in merged models while source models maintain safety scores comparable to unmodified versions. Our attack succeeds across diverse merging algorithms and remains effective under various hyperparameter configurations. These findings expose fundamental vulnerabilities in current model merging practices and highlight the urgent need for security-aware mechanisms.