LG IT NAAug 19, 2025

GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks

arXiv:2508.14004v22 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient deployment of neural networks on resource-constrained devices, though it appears incremental as it builds on existing quantization techniques.

The paper tackles the problem of low-bit neural network quantization by modeling it as a chain of noisy channels and identifying bottlenecks as bit-width decreases, achieving competitive accuracy down to the W1A1 setting with a differentiable method.

Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the average bit-width decreases and identify resulting quantization bottlenecks by casting fine-tuning as a smooth, constrained optimization problem. Our approach employs a fully differentiable Straight-Through Estimator (STE) with learnable bit-width, noise scale and clamp bounds, and enforces a target bit-width via an exterior-point penalty; mild metric smoothing (via distillation) stabilizes training. Despite its simplicity, the method attains competitive accuracy down to the extreme W1A1 setting while retaining the efficiency of STE.

View on arXiv PDF

Similar