Sparse Optimization on Measures with Over-parameterized Gradient Descent
This provides a more efficient algorithm for sparse optimization problems in fields like signal processing and machine learning, though it is incremental as it builds on existing gradient descent and measure space methods.
The paper tackles the problem of minimizing a convex function of a measure with a sparsity-inducing penalty, such as in sparse spikes deconvolution or neural network training, by showing that discretizing the measure and running non-convex gradient descent on particle positions and weights leads to a global optimization algorithm with complexity scaling as log(1/ε) instead of ε^{-d} for convex methods.
Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical problem arising, e.g., in sparse spikes deconvolution or two-layer neural networks training. We show that this problem can be solved by discretizing the measure and running non-convex gradient descent on the positions and weights of the particles. For measures on a $d$-dimensional manifold and under some non-degeneracy assumptions, this leads to a global optimization algorithm with a complexity scaling as $\log(1/ε)$ in the desired accuracy $ε$, instead of $ε^{-d}$ for convex methods. The key theoretical tools are a local convergence analysis in Wasserstein space and an analysis of a perturbed mirror descent in the space of measures. Our bounds involve quantities that are exponential in $d$ which is unavoidable under our assumptions.