ML IM LG AP MEAug 4, 2025

Trustworthy scientific inference for inverse problems with generative models

James Carzon, Luca Masserano, Joshua D. Ingram, Alex Shen, Antonio Carlos Herling Ribeiro Junior, Tommaso Dorigo, Michele Doro, Joshua S. Speagle, Rafael Izbicki, Ann B. Lee

arXiv:2508.02602v11 citationsh-index: 119

Originality Highly original

AI Analysis

It addresses the need for trustworthy scientific inference across fields like physical sciences where direct likelihood evaluation is infeasible, offering a solution to ensure reliable parameter estimation.

The paper tackles the problem of biased or overconfident conclusions when using generative models for inverse problems in scientific inference, presenting FreB, a protocol that reshapes AI-generated probability distributions into confidence regions with validity guarantees, achieving minimum size when training and target data align.

Generative artificial intelligence (AI) excels at producing complex data structures (text, images, videos) by learning patterns from training examples. Across scientific disciplines, researchers are now applying generative models to ``inverse problems'' to infer hidden parameters from observed data. While these methods can handle intractable models and large-scale studies, they can also produce biased or overconfident conclusions. We present a solution with Frequentist-Bayes (FreB), a mathematically rigorous protocol that reshapes AI-generated probability distributions into confidence regions that consistently include true parameters with the expected probability, while achieving minimum size when training and target data align. We demonstrate FreB's effectiveness by tackling diverse case studies in the physical sciences: identifying unknown sources under dataset shift, reconciling competing theoretical models, and mitigating selection bias and systematics in observational studies. By providing validity guarantees with interpretable diagnostics, FreB enables trustworthy scientific inference across fields where direct likelihood evaluation remains impossible or prohibitively expensive.

View on arXiv PDF

Similar