CLAILGApr 21

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

arXiv:2604.1988424.3h-index: 20
Predicted impact top 27% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This provides a diagnostic framework for improving quantization efficiency in LLM deployment, though it is incremental as it builds on existing PTQ methods.

The paper tackles the problem of catastrophic performance drops in 2-bit quantization of Large Language Models by uncovering two distinct failure modes: Signal Degradation and Computation Collapse, and shows that targeted repair can mitigate the former but not the latter.

Post-Training Quantization (PTQ) is critical for the efficient deployment of Large Language Models (LLMs). While 4-bit quantization is widely regarded as an optimal trade-off, reducing the precision to 2-bit usually triggers a catastrophic ``performance cliff.'' It remains unclear whether the underlying mechanisms differ fundamentally. Consequently, we conduct a systematic mechanistic analysis, revealing two qualitatively distinct failure modes: Signal Degradation, where the computational patterns remain intact but information precision is impaired by cumulative error; and Computation Collapse, where key components fail to function, preventing correct information processing and destroying the signal in the early layers. Guided by this diagnosis, we conduct mechanism-aware interventions, demonstrating that targeted, training-free repair can mitigate Signal Degradation, but remains ineffective for Computation Collapse. Our findings provide a systematic diagnostic framework for PTQ failures and suggest that addressing Computation Collapse requires structural reconstruction rather than mere compensation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes