LGNANov 2, 2025

Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding

arXiv:2511.00874v13 citationsh-index: 7EMNLP
Originality Incremental advance
AI Analysis

This work addresses resource-intensive LLM training for edge computing by providing an incremental improvement through batch size adjustments with stochastic rounding.

The paper tackles the problem of quantization noise hindering convergence in low-precision LLM training by showing that increased batch sizes can compensate for reduced precision during back-propagation, with experiments validating a 15% reduction in memory usage and competitive accuracy on edge devices.

LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy. Stochastic Rounding (SR) has emerged as a theoretically attractive alternative to deterministic rounding, offering unbiased gradient estimates. However, its interaction with other training factors -- especially batch size -- remains under explored. In this paper, we present a theoretical and empirical study of mini-batch stochastic gradient descent (SGD) with SR, showing that increased batch sizes can compensate for reduced precision during back-propagation. Furthermore, we show that quantizing weights and activations impacts gradient variance in distinct ways. Our experiments validate these theoretical insights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes