LGAISep 25, 2023

Revisiting LARS for Large Batch Training Generalization of Neural Networks

arXiv:2309.14053v59 citationsh-index: 55
Originality Incremental advance
AI Analysis

This work addresses the challenge of training neural networks with large batches more effectively, offering incremental improvements over existing methods like LARS and LAMB.

The paper tackles the problem of large batch training generalization in neural networks by proposing Time Varying LARS (TVLARS), which replaces warm-up with a sigmoid-like function to avoid sharp minimizers early on and transition to robustness later, resulting in up to 2% improvement in classification and up to 10% in self-supervised learning.

This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. Building on these findings, we propose Time Varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later phases. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2\% improvement in classification scenarios. Notably, in all self-supervised learning cases, TVLARS dominates LARS and LAMB with performance improvements of up to 10\%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes