LGAIJun 1

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

arXiv:2606.0236556.2
Predicted impact top 42% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners using Shampoo in large-scale optimization, FOAM reduces computational overhead without sacrificing convergence quality.

Shampoo's high computational cost from matrix inversion is mitigated by stale preconditioner updates, but this introduces performance degradation and instability. FOAM adaptively controls damping and update frequency to reduce staleness error, achieving faster wall-clock time while maintaining robust convergence.

Shampoo is attracting considerable attention for its superior performance on large-scale optimization benchmarks; yet it faces a significant practical bottleneck: the prohibitive computational overhead of matrix inversion. To mitigate this, practitioners typically rely on stale preconditioner updates, creating a fundamental trade-off between computational efficiency and optimization fidelity. In this work, we provide a theoretical study of staleness through the complementary lenses of convergence and stability. While staleness improves computational efficiency, it inherently degrades performance and introduces numerical instability. Crucially, we identify that damping, acting as a numerical stabilizer, can effectively suppress these negative effects. Guided by this analysis, we propose FOAM, an adaptive algorithm that stabilizes training by dynamically controlling both the damping factor and the eigendecomposition frequency based on an approximation of the staleness-oriented error. Experimental results demonstrate that FOAM reduces wall-clock time compared to standard Shampoo while maintaining robust convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes