CVJun 11, 2018

Data augmentation instead of explicit regularization

arXiv:1806.03852v5164 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient computational resource usage and hyperparameter sensitivity in deep learning for researchers and practitioners, suggesting a shift towards data augmentation to improve generalization and reduce carbon emissions.

The paper investigates the interplay between explicit regularization techniques like weight decay and dropout and implicit regularization from data augmentation in deep learning, finding that models trained with data augmentation alone achieve equal or better performance than those with additional explicit regularization, which can degrade performance if not carefully tuned.

Contrary to most machine learning models, modern deep artificial neural networks typically include multiple components that contribute to regularization. Despite the fact that some (explicit) regularization techniques, such as weight decay and dropout, require costly fine-tuning of sensitive hyperparameters, the interplay between them and other elements that provide implicit regularization is not well understood yet. Shedding light upon these interactions is key to efficiently using computational resources and may contribute to solving the puzzle of generalization in deep learning. Here, we first provide formal definitions of explicit and implicit regularization that help understand essential differences between techniques. Second, we contrast data augmentation with weight decay and dropout. Our results show that visual object categorization models trained with data augmentation alone achieve the same performance or higher than models trained also with weight decay and dropout, as is common practice. We conclude that the contribution on generalization of weight decay and dropout is not only superfluous when sufficient implicit regularization is provided, but also such techniques can dramatically deteriorate the performance if the hyperparameters are not carefully tuned for the architecture and data set. In contrast, data augmentation systematically provides large generalization gains and does not require hyperparameter re-tuning. In view of our results, we suggest to optimize neural networks without weight decay and dropout to save computational resources, hence carbon emissions, and focus more on data augmentation and other inductive biases to improve performance and robustness.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes