LGFeb 19, 2018

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

arXiv:1802.06509v2555 citations

AI Analysis

This work addresses optimization challenges in deep learning for researchers, offering a counterintuitive insight that depth can speed up training, though it is incremental as it focuses on a specific theoretical model.

The paper tackles the problem of deep network optimization by showing that increasing depth can accelerate convergence in overparameterized linear neural networks, acting as a preconditioner. Experiments and theory demonstrate this effect outperforms common acceleration schemes for tasks like linear regression with ℓ_p loss (p>2).

Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with $\ell_p$ loss, $p>2$, gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.

View on arXiv PDF

Similar