LGMLMay 27, 2019

On approximating dropout noise injection

arXiv:1905.11320v21 citations
Originality Incremental advance
AI Analysis

This work identifies a foundational flaw in prior theoretical analyses of dropout regularization, affecting researchers in machine learning theory.

The paper reveals that the established equivalence between dropout noise injection and L2 regularization for logistic regression relies on a divergent Taylor expansion, invalidating subsequent comparisons with standard regularizers, and extends this finding to neural networks with cross-entropy prediction layers.

This paper examines the assumptions of the derived equivalence between dropout noise injection and $L_2$ regularisation for logistic regression with negative log loss. We show that the approximation method is based on a divergent Taylor expansion, making, subsequent work using this approximation to compare the dropout trained logistic regression model with standard regularisers unfortunately ill-founded to date. Moreover, the approximation approach is shown to be invalid using any robust constraints. We show how this finding extends to general neural network topologies that use a cross-entropy prediction layer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes