Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?
This provides theoretical insights for researchers in optimization and machine learning, showing that strict saddle points can be desirable in certain contexts, which is an incremental but important clarification.
The paper tackles the problem of explaining learning in robust low-rank matrix recovery by investigating whether the ground truth matrix appears as a local optimum, and finds that under moderate assumptions, it instead emerges as a strict saddle point, challenging the belief that such points are undesirable.
We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a $d_1\times d_2$ matrix $X^\star$ with rank $r$ from $m$ linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth $\ell_1$-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix $X^\star$. Given this, a natural question is what optimization property (if any) enables such learning behavior. The most plausible answer is that the ground truth $X^\star$ manifests as a local optimum of the loss function. In this paper, we provide a strong negative answer to this question, showing that, under moderate assumptions, the true solutions corresponding to $X^\star$ do not emerge as local optima, but rather as strict saddle points -- critical points with strictly negative curvature in at least one direction. Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided.