Xiaofeng Chen

h-index26

3papers

82citations

Novelty50%

AI Score46

Ranked #39,149 of 194,257 authors (top 20%)#822 in CR (top 12%)

3 Papers

5.3CRMar 6Code

Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads

Jinman Wu, Yi Xie, Shiqian Zhao et al.

Currently, open-sourced large language models (OSLLMs) have demonstrated remarkable generative performance. However, as their structure and weights are made public, they are exposed to jailbreak attacks even after alignment. Existing attacks operate primarily at shallow levels, such as the prompt or embedding level, and often fail to expose vulnerabilities rooted in deeper model components, which creates a false sense of security for successful defense. In this paper, we propose \textbf{\underline{S}}afety \textbf{\underline{A}}ttention \textbf{\underline{H}}ead \textbf{\underline{A}}ttack (\textbf{SAHA}), an attention-head-level jailbreak framework that explores the vulnerability in deeper but insufficiently aligned attention heads. SAHA contains two novel designs. Firstly, we reveal that deeper attention layers introduce more vulnerability against jailbreak attacks. Based on this finding, \textbf{SAHA} introduces \textit{Ablation-Impact Ranking} head selection strategy to effectively locate the most vital layer for unsafe output. Secondly, we introduce a boundary-aware perturbation method, \textit{i.e. Layer-Wise Perturbation}, to probe the generation of unsafe content with minimal perturbation to the attention. This constrained perturbation guarantees higher semantic relevance with the target intent while ensuring evasion. Extensive experiments show the superiority of our method: SAHA improves ASR by 14\% over SOTA baselines, revealing the vulnerability of the attack surface on the attention head. Our code is available at https://anonymous.4open.science/r/SAHA.

6.6LGDec 25, 2023Code

ShiftKD: Benchmarking Knowledge Distillation under Distribution Shift

Songming Zhang, Yuxiao Luo, Ziyu Lyu et al.

Knowledge Distillation (KD) transfers knowledge from large models to small models and has recently achieved remarkable success. However, the reliability of existing KD methods in real-world applications, especially under distribution shift, remains underexplored. Distribution shift refers to the data distribution drifts between the training and testing phases, and this can adversely affect the efficacy of KD. In this paper, we propose a unified and systematic framework \textsc{ShiftKD} to benchmark KD against two general distributional shifts: diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Our development of \textsc{ShiftKD} conducts extensive experiments and reveals strengths and limitations of current SOTA KD methods. More importantly, we thoroughly analyze key factors in student model training process, including data augmentation, pruning methods, optimizers, and evaluation metrics. We believe \textsc{ShiftKD} could serve as an effective benchmark for assessing KD in real-world scenarios, thus driving the development of more robust KD methods in response to evolving demands. The code will be made available upon publication.

1.2SPOct 12, 2018

PatternListener: Cracking Android Pattern Lock Using Acoustic Signals

Man Zhou, Qian Wang, Jingxiao Yang et al.

Pattern lock has been widely used for authentication to protect user privacy on mobile devices (e.g., smartphones and tablets). Given its pervasive usage, the compromise of pattern lock could lead to serious consequences. Several attacks have been constructed to crack the lock. However, these approaches require the attackers to either be physically close to the target device or be able to manipulate the network facilities (e.g., WiFi hotspots) used by the victims. Therefore, the effectiveness of the attacks is significantly impacted by the environment of mobile devices. Also, these attacks are not scalable since they cannot easily infer unlock patterns of a large number of devices. Motivated by an observation that fingertip motions on the screen of a mobile device can be captured by analyzing surrounding acoustic signals on it, we propose PatternListener, a novel acoustic attack that cracks pattern lock by analyzing imperceptible acoustic signals reflected by the fingertip. It leverages speakers and microphones of the victim's device to play imperceptible audio and record the acoustic signals reflected by the fingertip. In particular, it infers each unlock pattern by analyzing individual lines that compose the pattern and are the trajectories of the fingertip. We propose several algorithms to construct signal segments according to the captured signals for each line and infer possible candidates of each individual line according to the signal segments. Finally, we map all line candidates into grid patterns and thereby obtain the candidates of the entire unlock pattern. We implement a PatternListener prototype by using off-the-shelf smartphones and thoroughly evaluate it using 130 unique patterns. The real experimental results demonstrate that PatternListener can successfully exploit over 90% patterns within five attempts.