BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
This addresses a security vulnerability in widely deployed LLMs, particularly in shared MLaaS environments, by enabling stealthy and realistic attacks that could increase operational costs for all users.
The paper tackles the problem of inference cost attacks on large language models (LLMs) by introducing a bit-flip attack that modifies model weights to induce persistent overhead for all users, achieving scalable cost inflation with only a few bit flips across 11 LLMs.
Large language models (LLMs) are widely deployed, but their growing compute demands expose them to inference cost attacks that maximize output length. We reveal that prior attacks are fundamentally self-targeting because they rely on crafted inputs, so the added cost accrues to the attacker's own queries and scales poorly in practice. In this work, we introduce the first bit-flip inference cost attack that directly modifies model weights to induce persistent overhead for all users of a compromised LLM. Such attacks are stealthy yet realistic in practice: for instance, in shared MLaaS environments, co-located tenants can exploit hardware-level faults (e.g., Rowhammer) to flip memory bits storing model parameters. We instantiate this attack paradigm with BitHydra, which (1) minimizes a loss that suppresses the end-of-sequence token (i.e., EOS) and (2) employs an efficient yet effective critical-bit search focused on the EOS embedding vector, sharply reducing the search space while preserving benign-looking outputs. We evaluate across 11 LLMs (1.5B-14B) under int8 and float16, demonstrating that our method efficiently achieves scalable cost inflation with only a few bit flips, while remaining effective even against potential defenses.