43.5CVMay 18
CATA: Continual Machine Unlearning via Conflict-Averse Task ArithmeticShen Lin, Junhao Dong, Rongjie Chen et al.
Vision-language models (VLMs) have shown remarkable ability in aligning visual and textual representations, enabling a wide range of multimodal applications. However, their large-scale training data inevitably raises concerns about privacy, copyright, and undesirable content, creating a strong need for machine unlearning. While existing studies mainly focus on single-shot unlearning, practical VLM deployment often involves sequential removal requests over time, giving rise to continual machine unlearning. In this work, we make the first attempt to study continual unlearning for VLMs and identify three key challenges in this setting: effectiveness in removing target knowledge, fidelity in preserving retained model utility, and persistence in preventing knowledge re-emergence under sequential updates. To address these challenges, we propose CATA, a conflict-averse task arithmetic method that represents each forget request as an unlearning task vector. By maintaining historical task vectors and performing sign-aware conflict-averse aggregation, CATA suppresses conflicting update components that may weaken previous forgetting effects. Extensive experiments under both single-shot and continual settings show that CATA outperforms baselines in terms of forgetting effectiveness, model fidelity, and forgetting persistence.
CRDec 17, 2023Code
SAME: Sample Reconstruction against Model Extraction AttacksYi Xie, Jie Zhang, Shiqian Zhao et al.
While deep learning models have shown significant performance across various domains, their deployment needs extensive resources and advanced computing infrastructure. As a solution, Machine Learning as a Service (MLaaS) has emerged, lowering the barriers for users to release or productize their deep learning models. However, previous studies have highlighted potential privacy and security concerns associated with MLaaS, and one primary threat is model extraction attacks. To address this, there are many defense solutions but they suffer from unrealistic assumptions and generalization issues, making them less practical for reliable protection. Driven by these limitations, we introduce a novel defense mechanism, SAME, based on the concept of sample reconstruction. This strategy imposes minimal prerequisites on the defender's capabilities, eliminating the need for auxiliary Out-of-Distribution (OOD) datasets, user query history, white-box model access, and additional intervention during model training. It is compatible with existing active defense methods. Our extensive experiments corroborate the superior efficacy of SAME over state-of-the-art solutions. Our code is available at https://github.com/xythink/SAME.
LGDec 30, 2022
Deep Hierarchy Quantization Compression algorithm based on Dynamic SamplingWan Jiang, Gang Liu, Xiaofeng Chen et al.
Unlike traditional distributed machine learning, federated learning stores data locally for training and then aggregates the models on the server, which solves the data security problem that may arise in traditional distributed machine learning. However, during the training process, the transmission of model parameters can impose a significant load on the network bandwidth. It has been pointed out that the vast majority of model parameters are redundant during model parameter transmission. In this paper, we explore the data distribution law of selected partial model parameters on this basis, and propose a deep hierarchical quantization compression algorithm, which further compresses the model and reduces the network load brought by data transmission through the hierarchical quantization of model parameters. And we adopt a dynamic sampling strategy for the selection of clients to accelerate the convergence of the model. Experimental results on different public datasets demonstrate the effectiveness of our algorithm.
CRMar 6Code
Depth Charge: Jailbreak Large Language Models from Deep Safety Attention HeadsJinman Wu, Yi Xie, Shiqian Zhao et al.
Currently, open-sourced large language models (OSLLMs) have demonstrated remarkable generative performance. However, as their structure and weights are made public, they are exposed to jailbreak attacks even after alignment. Existing attacks operate primarily at shallow levels, such as the prompt or embedding level, and often fail to expose vulnerabilities rooted in deeper model components, which creates a false sense of security for successful defense. In this paper, we propose \textbf{\underline{S}}afety \textbf{\underline{A}}ttention \textbf{\underline{H}}ead \textbf{\underline{A}}ttack (\textbf{SAHA}), an attention-head-level jailbreak framework that explores the vulnerability in deeper but insufficiently aligned attention heads. SAHA contains two novel designs. Firstly, we reveal that deeper attention layers introduce more vulnerability against jailbreak attacks. Based on this finding, \textbf{SAHA} introduces \textit{Ablation-Impact Ranking} head selection strategy to effectively locate the most vital layer for unsafe output. Secondly, we introduce a boundary-aware perturbation method, \textit{i.e. Layer-Wise Perturbation}, to probe the generation of unsafe content with minimal perturbation to the attention. This constrained perturbation guarantees higher semantic relevance with the target intent while ensuring evasion. Extensive experiments show the superiority of our method: SAHA improves ASR by 14\% over SOTA baselines, revealing the vulnerability of the attack surface on the attention head. Our code is available at https://anonymous.4open.science/r/SAHA.
LGDec 25, 2023
ShiftKD: Benchmarking Knowledge Distillation under Distribution ShiftSongming Zhang, Yuxiao Luo, Ziyu Lyu et al.
Knowledge Distillation (KD) transfers knowledge from large models to small models and has recently achieved remarkable success. However, the reliability of existing KD methods in real-world applications, especially under distribution shift, remains underexplored. Distribution shift refers to the data distribution drifts between the training and testing phases, and this can adversely affect the efficacy of KD. In this paper, we propose a unified and systematic framework \textsc{ShiftKD} to benchmark KD against two general distributional shifts: diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Our development of \textsc{ShiftKD} conducts extensive experiments and reveals strengths and limitations of current SOTA KD methods. More importantly, we thoroughly analyze key factors in student model training process, including data augmentation, pruning methods, optimizers, and evaluation metrics. We believe \textsc{ShiftKD} could serve as an effective benchmark for assessing KD in real-world scenarios, thus driving the development of more robust KD methods in response to evolving demands. The code will be made available upon publication.
CRMar 6
Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language ModelsJinman Wu, Yi Xie, Shen Lin et al.
Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the \textbf{\underline{D}}isentangled \textbf{\underline{S}}afety \textbf{\underline{H}}ypothesis \textbf{(DSH)}, positing that safety computation operates on two distinct subspaces: a \textit{Recognition Axis} ($\mathbf{v}_H$, ``Knowing'') and an \textit{Execution Axis} ($\mathbf{v}_R$, ``Acting''). Our geometric analysis reveals a universal ``Reflex-to-Dissociation'' evolution, where these signals transition from antagonistic entanglement in early layers to structural independence in deep layers. To validate this, we introduce \textit{Double-Difference Extraction} and \textit{Adaptive Causal Steering}. Using our curated \textsc{AmbiguityBench}, we demonstrate a causal double dissociation, effectively creating a state of ``Knowing without Acting.'' Crucially, we leverage this disentanglement to propose the \textbf{Refusal Erasure Attack (REA)}, which achieves State-of-the-Art attack success rates by surgically lobotomizing the refusal mechanism. Furthermore, we uncover a critical architectural divergence, contrasting the \textit{Explicit Semantic Control} of Llama3.1 with the \textit{Latent Distributed Control} of Qwen2.5. The code and dataset are available at https://anonymous.4open.science/r/DSH.
LGDec 25, 2023
Mixture Data for Training Cannot Ensure Out-of-distribution GeneralizationSongming Zhang, Yuxiao Luo, Qizhou Wang et al.
Deep neural networks often face generalization problems to handle out-of-distribution (OOD) data, and there remains a notable theoretical gap between the contributing factors and their respective impacts. Literature evidence from in-distribution data has suggested that generalization error can shrink if the size of mixture data for training increases. However, when it comes to OOD samples, this conventional understanding does not hold anymore -- Increasing the size of training data does not always lead to a reduction in the test generalization error. In fact, diverse trends of the errors have been found across various shifting scenarios including those decreasing trends under a power-law pattern, initial declines followed by increases, or continuous stable patterns. Previous work has approached OOD data qualitatively, treating them merely as samples unseen during training, which are hard to explain the complicated non-monotonic trends. In this work, we quantitatively redefine OOD data as those situated outside the convex hull of mixed training data and establish novel generalization error bounds to comprehend the counterintuitive observations better. Our proof of the new risk bound agrees that the efficacy of well-trained models can be guaranteed for unseen data within the convex hull; More interestingly, but for OOD data beyond this coverage, the generalization cannot be ensured, which aligns with our observations. Furthermore, we attempted various OOD techniques to underscore that our results not only explain insightful observations in recent OOD generalization work, such as the significance of diverse data and the sensitivity to unseen shifts of existing algorithms, but it also inspires a novel and effective data selection strategy.
CRMay 13, 2021
Privacy Inference Attacks and Defenses in Cloud-based Deep Neural Network: A SurveyXiaoyu Zhang, Chao Chen, Yi Xie et al.
Deep Neural Network (DNN), one of the most powerful machine learning algorithms, is increasingly leveraged to overcome the bottleneck of effectively exploring and analyzing massive data to boost advanced scientific development. It is not a surprise that cloud computing providers offer the cloud-based DNN as an out-of-the-box service. Though there are some benefits from the cloud-based DNN, the interaction mechanism among two or multiple entities in the cloud inevitably induces new privacy risks. This survey presents the most recent findings of privacy attacks and defenses appeared in cloud-based neural network services. We systematically and thoroughly review privacy attacks and defenses in the pipeline of cloud-based DNN service, i.e., data manipulation, training, and prediction. In particular, a new theory, called cloud-based ML privacy game, is extracted from the recently published literature to provide a deep understanding of state-of-the-art research. Finally, the challenges and future work are presented to help researchers to continue to push forward the competitions between privacy attackers and defenders.
CRMay 12, 2021
An Efficient Matrix Multiplication with Enhanced Privacy Protection in Cloud Computing and Its ApplicationsChun Liu, Xuexian Hu, Xiaofeng Chen et al.
As one of the most important basic operations, matrix multiplication computation (MMC) has varieties of applications in the scientific and engineering community such as linear regression, k-nearest neighbor classification and biometric identification. However handling these tasks with large-scale datasets will lead to huge computation beyond resource-constrained client s computation power. With the rapid development of cloud computing, outsourcing intensive tasks to cloud server has become a promising method. While the cloud server is generally out of the control of clients, there are still many challenges concerned with the privacy security of clients sensitive data. Motivated by this, Lei et al. presented an efficient encryption scheme based on random permutation to protect the privacy of client s data in outsourcing MMC task. Nevertheless, there exists inherent security flaws in their scheme, revealing the statistic information of zero elements in the original data thus not satisfying the computational indistinguishability (IND-ZEA). Aiming to enhance the security of the outsourcing MMC task, we propose a new encryption scheme based on subtly designed invertible matrix where the additive perturbation is introduced besides the multiplicative perturbation. Furthermore, we show that the proposed encryption scheme can be applied to not only MMC task but also other kinds of outsourced tasks such as linear regression and principal component analysis. Theoretical analyses and experiments indicate that our methods are more secure in terms of data privacy, with comparable performance to the state-of-the-art scheme based on matrix transformation.
SPOct 12, 2018
PatternListener: Cracking Android Pattern Lock Using Acoustic SignalsMan Zhou, Qian Wang, Jingxiao Yang et al.
Pattern lock has been widely used for authentication to protect user privacy on mobile devices (e.g., smartphones and tablets). Given its pervasive usage, the compromise of pattern lock could lead to serious consequences. Several attacks have been constructed to crack the lock. However, these approaches require the attackers to either be physically close to the target device or be able to manipulate the network facilities (e.g., WiFi hotspots) used by the victims. Therefore, the effectiveness of the attacks is significantly impacted by the environment of mobile devices. Also, these attacks are not scalable since they cannot easily infer unlock patterns of a large number of devices. Motivated by an observation that fingertip motions on the screen of a mobile device can be captured by analyzing surrounding acoustic signals on it, we propose PatternListener, a novel acoustic attack that cracks pattern lock by analyzing imperceptible acoustic signals reflected by the fingertip. It leverages speakers and microphones of the victim's device to play imperceptible audio and record the acoustic signals reflected by the fingertip. In particular, it infers each unlock pattern by analyzing individual lines that compose the pattern and are the trajectories of the fingertip. We propose several algorithms to construct signal segments according to the captured signals for each line and infer possible candidates of each individual line according to the signal segments. Finally, we map all line candidates into grid patterns and thereby obtain the candidates of the entire unlock pattern. We implement a PatternListener prototype by using off-the-shelf smartphones and thoroughly evaluate it using 130 unique patterns. The real experimental results demonstrate that PatternListener can successfully exploit over 90% patterns within five attempts.