Aimin Yu

CRApr 28, 2025

Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models

Yakai Li, Jiekang Hu, Weiduan Sang et al.

Large Language Models face security threats from jailbreak attacks. Existing research has predominantly focused on prompt-level attacks while largely ignoring the underexplored attack surface of user-controlled response prefilling. This functionality allows an attacker to dictate the beginning of a model's output, thereby shifting the attack paradigm from persuasion to direct state manipulation.In this paper, we present a systematic black-box security analysis of prefill-level jailbreak attacks. We categorize these new attacks and evaluate their effectiveness across fourteen language models. Our experiments show that prefill-level attacks achieve high success rates, with adaptive methods exceeding 99% on several models. Token-level probability analysis reveals that these attacks work through initial-state manipulation by changing the first-token probability from refusal to compliance.Furthermore, we show that prefill-level jailbreak can act as effective enhancers, increasing the success of existing prompt-level attacks by 10 to 15 percentage points. Our evaluation of several defense strategies indicates that conventional content filters offer limited protection. We find that a detection method focusing on the manipulative relationship between the prompt and the prefill is more effective. Our findings reveal a gap in current LLM safety alignment and highlight the need to address the prefill attack surface in future safety training.

CRApr 20, 2021

DeepHunter: A Graph Neural Network Based Approach for Robust Cyber Threat Hunting

Renzheng Wei, Lijun Cai, Aimin Yu et al.

Cyber Threat hunting is a proactive search for known attack behaviors in the organizational information system. It is an important component to mitigate advanced persistent threats (APTs). However, the attack behaviors recorded in provenance data may not be completely consistent with the known attack behaviors. In this paper, we propose DeepHunter, a graph neural network (GNN) based graph pattern matching approach that can match provenance data against known attack behaviors in a robust way. Specifically, we design a graph neural network architecture with two novel networks: attribute embedding networks that could incorporate Indicators of Compromise (IOCs) information, and graph embedding networks that could capture the relationships between IOCs. To evaluate DeepHunter, we choose five real and synthetic APT attack scenarios. Results show that DeepHunter can hunt all attack behaviors, and the accuracy and robustness of DeepHunter outperform the state-of-the-art method, Poirot.

Aimin Yu

2 Papers