More is Less: Inducing Sparsity via Overparameterization
This addresses the implicit bias in overparameterized neural networks for sparse recovery, offering a theoretical explanation and practical improvement in compressed sensing, though it is incremental as it builds on prior work.
The paper tackles the problem of sparse recovery in compressed sensing by overparameterizing the loss function and using gradient flow, showing that it converges to a good approximation of the minimal ℓ₁-norm solution, which promotes sparsity, and significantly improves sample complexity compared to previous works.
In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal $\ell_1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest.