ML LG MEApr 11, 2024

Diffusion posterior sampling for simulation-based inference in tall data settings

Julia Linhart, Gabriel Victorino Cardoso, Alexandre Gramfort, Sylvain Le Corff, Pedro L. C. Rodrigues

arXiv:2404.07593v220.215 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of parameter inference in tall data settings for scientists using complex simulators, offering an incremental improvement over prior methods.

The authors tackled the problem of inferring parameters from multiple observations in simulation-based inference by proposing a diffusion posterior sampling method that leverages a score network trained on single observations, demonstrating superior numerical stability and computational cost compared to existing approaches.

Determining which parameters of a non-linear model best describe a set of experimental data is a fundamental problem in science and it has gained much traction lately with the rise of complex large-scale simulators. The likelihood of such models is typically intractable, which is why classical MCMC methods can not be used. Simulation-based inference (SBI) stands out in this context by only requiring a dataset of simulations to train deep generative models capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. The proposed method is built upon recent developments from the flourishing score-based diffusion literature and allows to estimate the tall data posterior distribution, while simply using information from a score network trained for a single context observation. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.

View on arXiv PDF Code

Similar