ML LG STJul 11, 2014

Altitude Training: Strong Bounds for Single-Layer Dropout

Stefan Wager, William Fithian, Sida Wang, Percy Liang

arXiv:1407.3289v250 citations

AI Analysis

This provides a theoretical foundation for dropout's effectiveness in natural language processing, addressing a known bottleneck in understanding its generalization properties.

The paper tackles the problem of explaining why dropout training works well on high-dimensional single-layer natural language tasks by showing that, under a generative Poisson topic model with long documents, dropout improves the exponent in the generalization bound for empirical risk minimization, achieving better performance on uncorrupted test sets.

Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.

View on arXiv PDF

Similar