LGAROct 9, 2025

LOTION: Smoothing the Optimization Landscape for Quantized Training

arXiv:2510.08757v13 citationsh-index: 96
Originality Incremental advance
AI Analysis

This work addresses the problem of training quantized neural networks more effectively for machine learning practitioners, offering a principled method with convergence guarantees, though it is incremental as it builds on existing smoothing techniques.

The paper tackles the challenge of optimizing neural networks for quantized objectives by introducing LOTION, a smoothing framework that approximates the quantized loss surface with a continuous one using stochastic-noise smoothing, and empirically shows it outperforms standard quantization-aware training on synthetic testbeds and language models with 150M and 300M parameters.

Optimizing neural networks for quantized objectives is fundamentally challenging because the quantizer is piece-wise constant, yielding zero gradients everywhere except at quantization thresholds where the derivative is undefined. Most existing methods deal with this issue by relaxing gradient computations with techniques like Straight Through Estimators (STE) and do not provide any guarantees of convergence. In this work, taking inspiration from Nesterov smoothing, we approximate the quantized loss surface with a continuous loss surface. In particular, we introduce LOTION, \textbf{L}ow-precision \textbf{O}ptimization via s\textbf{T}ochastic-no\textbf{I}se sm\textbf{O}othi\textbf{N}g, a principled smoothing framework that replaces the raw quantized loss with its expectation under unbiased randomized-rounding noise. In this framework, standard optimizers are guaranteed to converge to a local minimum of the loss surface. Moreover, when using noise derived from stochastic rounding, we show that the global minima of the original quantized loss are preserved. We empirically demonstrate that this method outperforms standard QAT on synthetic testbeds and on 150M- and 300M- parameter language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes