ML LGJul 8, 2021

Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning for Reliable Simulator-Based Inference

Niccolò Dalmasso, Luca Masserano, David Zhao, Rafael Izbicki, Ann B. Lee

arXiv:2107.03920v107.415 citationsHas Code

Originality Highly original

AI Analysis

This addresses the need for valid scientific inference in fields relying on simulators, offering a modular solution that bridges classical statistics and machine learning.

The paper tackles the problem of constructing reliable confidence sets in likelihood-free inference (LFI) settings where simulators have intractable likelihoods, proposing a framework called LF2I that provides near finite-sample validity and interpretable diagnostics, demonstrated on high-dimensional complex data.

Many areas of science rely on simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, popular LFI methods - such as Approximate Bayesian Computation or more recent machine learning techniques - do not necessarily lead to valid scientific inference because they do not guarantee confidence sets with nominal coverage in general settings. In addition, LFI currently lacks practical diagnostic tools to check the actual coverage of computed confidence sets across the entire parameter space. In this work, we propose a modular inference framework that bridges classical statistics and modern machine learning to provide (i) a practical approach for constructing confidence sets with near finite-sample validity at any value of the unknown parameters, and (ii) interpretable diagnostics for estimating empirical coverage across the entire parameter space. We refer to this framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic can leverage LF2I to create valid confidence sets and diagnostics without costly Monte Carlo or bootstrap samples at fixed parameter settings. We study two likelihood-based test statistics (ACORE and BFF) and demonstrate their performance on high-dimensional complex data. Code is available at https://github.com/lee-group-cmu/lf2i.

View on arXiv PDF Code

Similar