NANAMay 24

Sampling Distributions as Regularization in Learned Inverse Problems

arXiv:2605.251774.6
Predicted impact top 93% in NA · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a theoretical and practical framework for understanding and choosing sampling distributions in learned inverse problems, which is important for scientific applications where observational data is scarce.

The paper shows that the sampling distribution used to generate synthetic training data for learned inverse problems acts as an implicit regularization operator, and demonstrates that a mismatched sampling distribution degrades reconstruction quality in ways that cannot be fully corrected by more expressive architectures or physics-informed residuals.

Neural networks have emerged as effective tools for solving ill-posed inverse problems. In many scientific applications, however, observational training data are insufficient, and learned inverse operators must instead be trained on synthetic data generated from the forward model. This requires specifying unknown parameters in the forward model and solving the model to generate synthetic observations. Typically, the unknown parameters are sampled from a prescribed probability distribution. Here, we show that this sampling strategy is not a neutral preprocessing step, but instead defines an implicit regularization operator. This result follows from the fact that the learned inverse operator minimizes empirical risk together with the classical result that conditional expectation minimizes mean-square error. We present theoretical results for the implicit regularization operator in both infinite- and finite-data settings, including Physics Informed Neural Networks (PINNs). These results are demonstrated numerically on three inverse problems of increasing complexity: a 1D linear Fredholm integral equation, a 1D nonlinear subsurface interface inversion, and a 2D nonlinear cross-well seismic traveltime tomography problem. Across all three problems, three distinct sources of regularization are identified in the learned operator: prior sampling, architectural, and physics-informed regularization. A mismatched sampling distribution is shown to degrade reconstruction quality in ways that neither more expressive architectures nor augmented physics residuals can fully correct. The results demonstrate that the sampling distribution should be chosen with the same care as a classical regularization functional and provide a practical framework for implementing more sophisticated regularization operators using neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes