STLGQUANT-PHOct 14, 2021

Near optimal sample complexity for matrix and tensor normal models via geodesic convexity

arXiv:2110.07583v312 citations
Originality Highly original
AI Analysis

This work addresses the challenge of efficient covariance estimation for matrix and tensor data, which is incremental by improving upon prior assumptions and providing tighter theoretical guarantees.

The paper tackles the problem of estimating Kronecker factors of covariance matrices in matrix and tensor normal models, showing that the maximum likelihood estimator achieves nearly optimal sample complexity and error rates without requiring well-conditioned or sparse factors. The results include minimax optimal bounds up to logarithmic factors for the matrix model and up to constant factors for the tensor model, with the flip-flop algorithm converging linearly under these conditions.

The matrix normal model, i.e., the family of Gaussian matrix-variate distributions whose covariance matrices are the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor normal models. For the above models, we show that the maximum likelihood estimator (MLE) achieves nearly optimal nonasymptotic sample complexity and nearly tight error rates in the Fisher-Rao and Thompson metrics. In contrast to prior work, our results do not rely on the factors being well-conditioned or sparse, nor do we need to assume an accurate enough initial guess. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bounds for the largest factor and for overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that the flip-flop algorithm, a practical and widely used iterative procedure to compute the MLE, converges linearly with high probability. Our main technical insight is that, given enough samples, the negative log-likelihood function is strongly geodesically convex in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes