MLLGMENov 19, 2021

Composite Goodness-of-fit Tests with Kernels

arXiv:2111.10275v523 citations
Originality Incremental advance
AI Analysis

This addresses the lack of generally applicable methods for composite goodness-of-fit testing in probabilistic models, which is crucial for practitioners in statistics and machine learning to assess model misspecification, though it appears incremental as it builds on existing kernel-based discrepancy measures.

The paper tackles the problem of determining whether data comes from any distribution within a parametric family, proposing kernel-based hypothesis tests using minimum distance estimators based on maximum mean discrepancy and kernel Stein discrepancy. The main result shows that parameter estimation and testing can be conducted on the same data without data splitting while maintaining correct test level, as illustrated on problems like testing unnormalized non-parametric density models and intractable generative models.

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In this paper, we propose one such method. More precisely, we propose kernel-based hypothesis tests for the challenging composite testing problem, where we are interested in whether the data comes from any distribution in some parametric family. Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy. They are widely applicable, including whenever the density of the parametric model is known up to normalisation constant, or if the model takes the form of a simulator. As our main result, we show that we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. Our approach is illustrated on a range of problems, including testing for goodness-of-fit of an unnormalised non-parametric density model, and an intractable generative model of a biological cellular network.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes