LGAIDec 13, 2025

CurvaDion: Curvature-Adaptive Distributed Orthonormalization

arXiv:2512.13728v11 citations
Originality Highly original
AI Analysis

This addresses the critical problem of communication overhead in distributed training for large-scale AI models, offering a significant improvement over existing methods.

The paper tackles the bottleneck of gradient synchronization in distributed training of large language models by introducing CurvaDion, which adaptively synchronizes only in high-curvature regions, achieving a 99% communication reduction while matching baseline convergence across models from 160M to 1.3B parameters.

As language models scale to trillions of parameters, distributed training across many GPUs becomes essential, yet gradient synchronization over high-bandwidth, low-latency networks remains a critical bottleneck. While recent methods like Dion reduce per-step communication through low-rank updates, they synchronize at every step regardless of the optimization landscape. We observe that synchronization requirements vary dramatically throughout training: workers naturally compute similar gradients in flat regions, making frequent synchronization redundant, while high-curvature regions require coordination to prevent divergence. We introduce CurvaDion, which uses Relative Maximum Momentum Change (RMMC) to detect high-curvature regions requiring synchronization. RMMC leverages momentum dynamics which are already computed during optimization as a computationally tractable proxy for directional curvature, adding only $\mathcal{O}(d)$ operations per layer. We establish theoretical connections between RMMC and loss curvature and demonstrate that CurvaDion achieves 99\% communication reduction while matching baseline convergence across models from 160M to 1.3B parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes