Implicit Bias and Convergence of Matrix Stochastic Mirror Descent
This work addresses theoretical foundations for implicit bias in high-dimensional, multi-output machine learning problems, providing insights for researchers in optimization and learning theory, but it is incremental as it generalizes existing vector results to matrix settings.
The paper tackled the problem of understanding convergence and implicit bias in Stochastic Mirror Descent (SMD) with matrix parameters for overparameterized regimes like multi-class classification and matrix completion, proving exponential convergence to a global interpolator and showing that the algorithm converges to a unique solution minimizing the Bregman divergence from initialization while interpolating data.
We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove that SMD with matrix mirror functions $ψ(\cdot)$ converges exponentially to a global interpolator. Furthermore, we generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by $ψ(\cdot)$ from initialization subject to interpolating the data. These findings reveal how matrix mirror maps dictate inductive bias in high-dimensional, multi-output problems.