LG OC MLNov 9, 2023

A Coefficient Makes SVRG Effective

Yida Yin, Zhiqiu Xu, Zhiyuan Li, Trevor Darrell, Zhuang Liu

arXiv:2311.05589v28.86 citationsh-index: 139Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of applying SVRG in deep learning, which is an incremental improvement for researchers and practitioners in optimization and neural network training.

The paper tackled the problem of making Stochastic Variance Reduced Gradient (SVRG) effective for optimizing deep neural networks by introducing a multiplicative coefficient α with a linear decay schedule to control the variance reduction term, resulting in consistently reduced training loss compared to baselines across various architectures and datasets.

Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang (2013), is a theoretically compelling optimization method. However, as Defazio & Bottou (2019) highlight, its effectiveness in deep learning is yet to be proven. In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks. Our empirical analysis finds that, for deeper neural networks, the strength of the variance reduction term in SVRG should be smaller and decrease as training progresses. Inspired by this, we introduce a multiplicative coefficient $α$ to control the strength and adjust it through a linear decay schedule. We name our method $α$-SVRG. Our results show $α$-SVRG better optimizes models, consistently reducing training loss compared to the baseline and standard SVRG across various model architectures and multiple image classification datasets. We hope our findings encourage further exploration into variance reduction techniques in deep learning. Code is available at github.com/davidyyd/alpha-SVRG.

View on arXiv PDF Code

Similar