AIMay 1

The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning

Henry Han, Xiyang Liu, Xiaodong Wang, Fei Han, Xiaodong Li

arXiv:2602.1359561.92 citationsh-index: 10

Predicted impact top 61% in AI · last 90 daysOriginality Highly original

AI Analysis

For AI practitioners deploying quantized models for complex reasoning, the work reveals that the common 'smaller-is-better' heuristic is counterproductive, providing a theoretical framework to predict when quantization is beneficial.

The paper identifies a 'quantization trap' where reducing numerical precision from 16-bit to 8/4-bit increases net energy consumption and degrades accuracy in multi-hop reasoning, breaking linear scaling laws. The trap is attributed to hardware casting overhead and sequential energy amortization failure, validated across 0.6B-72B models on six GPU architectures.

Neural scaling laws provide a predictable recipe for AI advancement: reducing numerical precision should linearly improve computational efficiency and energy profile ($E \propto \mathrm{bits}$). In this paper, we demonstrate that this scaling law breaks in the context of multi-hop reasoning. We reveal a 'quantization trap' where reducing precision from 16-bit to 8/4-bit paradoxically increases net energy consumption while degrading reasoning accuracy. We provide a rigorous theoretical decomposition that attributes this failure to hardware casting overhead, the hidden latency cost of dequantization kernels, which becomes a dominant bottleneck in sequential reasoning chains, as well as to a sequential energy amortization failure. As a result, scaling law breaking is unavoidable in practice. We formalize a Critical Model Scale $N^*$ that predicts when the trap dissolves or deepens as a function of model size, batch size, and hardware configuration, validated across a 120$\times$ range (0.6B--72B) on six GPU architectures. Our findings suggest that the industry's "smaller-is-better" heuristic is mathematically counterproductive for complex reasoning tasks.

View on arXiv PDF

Similar