LGSPDec 20, 2022

Input Normalized Stochastic Gradient Descent Training of Deep Neural Networks

arXiv:2212.09921v21 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses the challenge of selecting optimizer parameters for training complex models on large datasets, though it appears incremental as it builds on existing normalization methods.

The paper tackles the problem of optimizing deep neural network training by proposing Input Normalized Stochastic Gradient Descent (INSGD), which normalizes the learning rate using input vectors to avoid divergence and improve accuracy, achieving gains such as increasing ResNet-18 accuracy on CIFAR-10 from 92.42% to 92.71%.

In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-18, WResNet-20, ResNet-50, and a toy neural network. Our INSGD algorithm improves the accuracy of ResNet-18 on CIFAR-10 from 92.42\% to 92.71\%, WResNet-20 on CIFAR-100 from 76.20\% to 77.39\%, and ResNet-50 on ImageNet-1K from 75.52\% to 75.67\%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes