Between hard and soft thresholding: optimal iterative thresholding algorithms
This work addresses the optimization efficiency for sparsity-constrained problems in machine learning, offering a novel theoretical improvement over existing methods.
The paper tackles the problem of optimizing a differentiable objective under sparsity constraints by analyzing thresholding operators, finding that commonly used hard and soft thresholding are suboptimal in worst-case convergence. It introduces a new class of thresholding operators, such as ℓ_q and reciprocal thresholding, which achieve optimal convergence guarantees and match the Lasso's optimal rate in sparse linear regression.
Iterative thresholding algorithms seek to optimize a differentiable objective function over a sparsity or rank constraint by alternating between gradient steps that reduce the objective, and thresholding steps that enforce the constraint. This work examines the choice of the thresholding operator, and asks whether it is possible to achieve stronger guarantees than what is possible with hard thresholding. We develop the notion of relative concavity of a thresholding operator, a quantity that characterizes the worst-case convergence performance of any thresholding operator on the target optimization problem. Surprisingly, we find that commonly used thresholding operators, such as hard thresholding and soft thresholding, are suboptimal in terms of worst-case convergence guarantees. Instead, a general class of thresholding operators, lying between hard thresholding and soft thresholding, is shown to be optimal with the strongest possible convergence guarantee among all thresholding operators. Examples of this general class includes $\ell_q$ thresholding with appropriate choices of $q$, and a newly defined {\em reciprocal thresholding} operator. We also investigate the implications of the improved optimization guarantee in the statistical setting of sparse linear regression, and show that this new class of thresholding operators attain the optimal rate for computationally efficient estimators, matching the Lasso.