LGMLFeb 28, 2020

The Implicit and Explicit Regularization Effects of Dropout

arXiv:2002.12915v3130 citations
AI Analysis

This work provides a deeper theoretical understanding of dropout for machine learning practitioners, though it is incremental as it builds on prior studies of explicit effects.

The paper tackles the problem of understanding dropout's regularization effects by demonstrating it introduces both explicit and implicit effects, and shows that derived analytic regularizers can accurately replace dropout in practice.

Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects: an explicit effect (also studied in prior work) which occurs since dropout modifies the expected training objective, and, perhaps surprisingly, an additional implicit effect from the stochasticity in the dropout training update. This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent. We disentangle these two effects through controlled experiments. We then derive analytic simplifications which characterize each effect in terms of the derivatives of the model and the loss, for deep neural networks. We demonstrate these simplified, analytic regularizers accurately capture the important aspects of dropout, showing they faithfully replace dropout in practice.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes