Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
This addresses the problem of achieving stable, low-rank representations in machine learning models, which is incremental by building on existing implicit bias theories with a new architectural approach.
The paper tackled the implicit bias in matrix factorization towards low-rank solutions, even with unbounded iterates, by introducing a new constrained factorization model that yields truly low-rank solutions and extends to neural networks for competitive performance on regression and classification tasks.
Gradient descent for matrix factorization exhibits an implicit bias toward approximately low-rank solutions. While existing theories often assume the boundedness of iterates, empirically the bias persists even with unbounded sequences. This reflects a dynamic where factors develop low-rank structure while their magnitudes increase, tending to align with certain directions. To capture this behavior in a stable way, we introduce a new factorization model: $X\approx UDV^\top$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space. Experiments show that this model consistently exhibits a strong implicit bias, yielding truly (rather than approximately) low-rank solutions. Extending the idea to neural networks, we introduce a new model featuring constrained layers and diagonal components that achieves competitive performance on various regression and classification tasks while producing lightweight, low-rank representations.