Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary
This work addresses signal processing and machine learning tasks involving continuous dictionaries, offering theoretical guarantees for off-the-grid estimation, but it is incremental as it builds on existing geometry-based methods.
The paper tackles the problem of estimating signals corrupted by noise, where each signal is a mixture of features from a continuous dictionary with unknown parameters, by formulating a regularized optimization problem called Group-Nonlinear-Lasso. It provides high-probability bounds on prediction error, showing that for p=2, the rates match those of Group-Lasso in multi-task linear regression and are faster than for p=1 when signals share most parameters.
In this paper we observe a set, possibly a continuum, of signals corrupted by noise. Each signal is a finite mixture of an unknown number of features belonging to a continuous dictionary. The continuous dictionary is parametrized by a real non-linear parameter. We shall assume that the signals share an underlying structure by assuming that each signal has its active features included in a finite and sparse set. We formulate regularized optimization problem to estimate simultaneously the linear coefficients in the mixtures and the non-linear parameters of the features. The optimization problem is composed of a data fidelity term and a $(\ell_1,L^p)$-penalty. We call its solution the Group-Nonlinear-Lasso and provide high probability bounds on the prediction error using certificate functions. Following recent works on the geometry of off-the-grid methods, we show that such functions can be constructed provided the parameters of the active features are pairwise separated by a constant with respect to a Riemannian metric.When the number of signals is finite and the noise is assumed Gaussian, we give refinements of our results for $p=1$ and $p=2$ using tail bounds on suprema of Gaussian and $χ^2$ random processes. When $p=2$, our prediction error reaches the rates obtained by the Group-Lasso estimator in the multi-task linear regression model. Furthermore, for $p=2$ these prediction rates are faster than for $p=1$ when all signals share most of the non-linear parameters.