On Convergence and Generalization of Dropout Training
This provides theoretical guarantees for dropout in neural networks, which is incremental as it builds on existing work.
The paper tackles the problem of analyzing dropout training in two-layer ReLU networks, showing that under certain conditions, dropout achieves ε-suboptimal test error in O(1/ε) iterations.
We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that dropout training with logistic loss achieves $ε$-suboptimality in test error in $O(1/ε)$ iterations.