Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications
This work addresses fundamental limits in high-dimensional matrix sensing, offering theoretical insights for machine learning applications, though it is incremental in extending prior results to more general settings.
The paper tackles the matrix sensing problem for general structured matrices beyond low-rank cases, providing rigorous asymptotic equations for Bayes-optimal learning performance with sample sizes proportional to matrix entries. It establishes predictions for applications like Bilinear Sequence Regression and neural networks with quadratic activations.
In the matrix sensing problem, one wishes to reconstruct a matrix from (possibly noisy) observations of its linear projections along given directions. We consider this model in the high-dimensional limit: while previous works on this model primarily focused on the recovery of low-rank matrices, we consider in this work more general classes of structured signal matrices with potentially large rank, e.g. a product of two matrices of sizes proportional to the dimension. We provide rigorous asymptotic equations characterizing the Bayes-optimal learning performance from a number of samples which is proportional to the number of entries in the matrix. Our proof is composed of three key ingredients: $(i)$ we prove universality properties to handle structured sensing matrices, related to the ''Gaussian equivalence'' phenomenon in statistical learning, $(ii)$ we provide a sharp characterization of Bayes-optimal learning in generalized linear models with Gaussian data and structured matrix priors, generalizing previously studied settings, and $(iii)$ we leverage previous works on the problem of matrix denoising. The generality of our results allow for a variety of applications: notably, we mathematically establish predictions obtained via non-rigorous methods from statistical physics in [ETB+24] regarding Bilinear Sequence Regression, a benchmark model for learning from sequences of tokens, and in [MTM+24] on Bayes-optimal learning in neural networks with quadratic activation function, and width proportional to the dimension.