Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
This work addresses the theoretical explanation of generalization in machine learning, particularly for stochastic training algorithms, but it is incremental as it builds on existing implicit bias research with a specific model.
The paper tackles the problem of understanding implicit bias in overparametrized neural networks by analyzing label noise in training dynamics for a quadratically parametrized model, proving that the stochastic gradient descent flow implicitly solves a Lasso program and providing nonasymptotic convergence guarantees and support recovery conditions.
Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of stochastic dynamics as observed in practice.