Jeonghyun Woo

h-index15
2papers

2 Papers

37.3CRMay 17Code
Loaded Dice: Solving the Non-Selection Problem for Scalable Probabilistic RowHammer Defense

Jeonghyun Woo, Junsu Kim, Aamer Jaleel et al.

DRAM scaling has exacerbated the RowHammer vulnerability. To counter this, JEDEC recently introduced Per Row Activation Counting (PRAC) with the Alert Back-Off protocol as an optional DDR5 feature. While promising, PRAC requires per-row counter cells that incur area overhead, and updating them on every activation lengthens DRAM timing parameters, degrading performance. Probabilistic mitigations such as MINT offer a lower-cost alternative by randomly selecting and mitigating rows within periodic mitigation windows. MINT is effective at higher thresholds (>= 1000), but at lower thresholds, it must raise its mitigation rate to overcome the non-selection problem, where heavily hammered rows can repeatedly escape sampling. This fixed-rate scaling reduces effective memory bandwidth even when no attack is present. To overcome this limitation, we propose PrISM, an intersection-based probabilistic mitigation that correlates sampled rows across windows using a Sampled History Queue (SHQ). PrISM samples a few activation slots per window, stores sampled-but-unmitigated rows in the SHQ, and requests an additional mitigation through the existing Alert Back-Off protocol when a sampled row reappears in this history. This allows PrISM to increase mitigation only when persistent row activity is observed, without globally increasing the fixed mitigation rate. At the threshold of 500, PrISM incurs a negligible 0.2% average slowdown compared to 14% for PRAC, with no DRAM array changes or per-row counters and only 625B of SRAM per bank, one to two orders of magnitude less than prior secure counter-based in-DRAM defenses. Compared to MINT, PrISM provides better scalability at low thresholds, reducing average slowdown from 10.7% to 1.5% at a threshold of 250, a 7.1x reduction. PrISM is open-sourced at https://github.com/STAR-Laboratory/prism.

CRDec 10, 2024Code
PrisonBreak: Jailbreaking Large Language Models with at Most Twenty-Five Targeted Bit-flips

Zachary Coalson, Jeonghyun Woo, Chris S. Lin et al.

We study a new vulnerability in commercial-scale safety-aligned large language models (LLMs): their refusal to generate harmful responses can be broken by flipping only a few bits in model parameters. Our attack jailbreaks billion-parameter language models with just 5 to 25 bit-flips, requiring up to 40$\times$ fewer bit flips than prior attacks on much smaller computer vision models. Unlike prompt-based jailbreaks, our method directly uncensors models in memory at runtime, enabling harmful outputs without requiring input-level modifications. Our key innovation is an efficient bit-selection algorithm that identifies critical bits for language model jailbreaks up to 20$\times$ faster than prior methods. We evaluate our attack on 10 open-source LLMs, achieving high attack success rates (ASRs) of 80-98% with minimal impact on model utility. We further demonstrate an end-to-end exploit via Rowhammer-based fault injection, reliably jailbreaking 5 models (69-91% ASR) on a GDDR6 GPU. Our analyses reveal that: (1) models with weaker post-training alignment require fewer bit-flips to jailbreak; (2) certain model components, e.g., value projection layers, are substantially more vulnerable; and (3) the attack is mechanistically different from existing jailbreak methods. We evaluate potential countermeasures and find that our attack remains effective against defenses at various stages of the LLM pipeline.