LGApr 15

Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

arXiv:2604.1380667.9h-index: 3
Predicted impact top 27% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners deploying LLMs, DASH-Q enables more accurate ultra low-bit quantization, reducing memory footprint without retraining.

DASH-Q proposes a robust post-training quantization method for LLMs that uses diagonal Hessian approximation and iterative weighted least squares, outperforming baselines in ultra low-bit regimes with up to 14.01% accuracy improvement.

Large Language Models (LLMs) are widely used across many domains, but their scale makes deployment challenging. Post-Training Quantization (PTQ) reduces memory footprint without retraining by leveraging a small calibration set. Recent Hessian-based PTQ methods compensate quantization error via cross-channel dependencies, but such approaches degrade at low bit-widths due to noisy curvature estimates from limited calibration data. We propose DASH-Q, a robust PTQ framework using diagonal Hessian approximation and iterative weighted least squares. By discarding noise-prone dependencies, DASH-Q filters sampling noise while prioritizing the preservation of salient feature power. We outperform other PTQ baselines in ultra low-bit regime, improving zero-shot accuracy by 7.01% on average and up to 14.01% over the strongest baselines across five baseline LLM models, while showing robust and stable performance with very small calibration data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes