LGMar 29, 2023

Lipschitzness Effect of a Loss Function on Generalization Performance of Deep Neural Networks Trained by Adam and AdamW Optimizers

arXiv:2303.16464v35 citationsh-index: 6
Originality Incremental advance
AI Analysis

This provides a guideline for selecting loss functions in Adam-based training, but it is incremental as it builds on existing optimization theory.

The paper theoretically proves that the Lipschitz constant of a loss function reduces generalization error for models trained with Adam or AdamW, and experimentally shows that lower Lipschitz constants improve generalization in human age estimation tasks with distribution shifts.

The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with a lower Lipschitz constant and maximum value improves the generalization of the model trained by Adam or AdamW.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes