Goodness-of-fit tests on manifolds
It provides a general statistical framework for model selection in machine learning and signal processing, addressing a foundational challenge in fitting non-linear models.
The paper tackles the problem of goodness-of-fit testing for non-linear models on manifolds, showing that the residual from a non-linear least-squares fit follows a chi-squared distribution with parameters related to model order and dimension, enabling applications like determining matrix rank or number of sources in signal demixing.
We develop a general theory for the goodness-of-fit test to non-linear models. In particular, we assume that the observations are noisy samples of a submanifold defined by a \yao{sufficiently smooth non-linear map}. The observation noise is additive Gaussian. Our main result shows that the "residual" of the model fit, by solving a non-linear least-square problem, follows a (possibly noncentral) $χ^2$ distribution. The parameters of the $χ^2$ distribution are related to the model order and dimension of the problem. We further present a method to select the model orders sequentially. We demonstrate the broad application of the general theory in machine learning and signal processing, including determining the rank of low-rank (possibly complex-valued) matrices and tensors from noisy, partial, or indirect observations, determining the number of sources in signal demixing, and potential applications in determining the number of hidden nodes in neural networks.