LGAIMar 20

DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

arXiv:2603.2232482.7h-index: 7
AI Analysis

This addresses the issue of knowledge corruption in post-training quantization for large language models, offering an incremental improvement over existing methods.

The paper tackles the problem of preserving post-training knowledge during LLM weight compression by introducing Delta-Aware Quantization (DAQ), which replaces standard reconstruction objectives with delta-aware metrics to optimize directional fidelity of parameter deltas, recovering style-specific capabilities lost under standard quantization in a pilot FP8 study.

We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas ($ΔW$) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces reconstruction-based objectives with two delta-aware metrics -- Sign Preservation Rate and Cosine Similarity -- that directly optimize for directional fidelity of $ΔW$, requiring only the base and post-trained weight matrices. In a pilot FP8 study, DAQ recovers style-specific capabilities lost under standard quantization while maintaining general performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes