LGJun 12, 2023

NF4 Isn't Information Theoretically Optimal (and that's Good)

arXiv:2306.06965v214 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in quantization methods for machine learning, though it appears incremental as it builds directly on existing NF4 quantization.

The paper challenges the claim that NF4 quantization is information-theoretically optimal for normally distributed weights, showing that the distribution depends on block size, and proposes a new code minimizing expected L1 reconstruction error, which improves performance for larger block sizes.

This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023. Their proposed NF4 data type is said to be information theoretically optimal for representing normally distributed weights. I show that this can't quite be the case, as the distribution of the values to be quantized depends on the block-size. I attempt to apply these insights to derive an improved code based on minimizing the expected L1 reconstruction error, rather than the quantile based method. This leads to improved performance for larger quantization block sizes, while both codes perform similarly at smaller block sizes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes