Yunan Yang

NA
h-index14
25papers
628citations
Novelty50%
AI Score55

25 Papers

NAAug 27, 2025
Operator learning meets inverse problems: A probabilistic perspective

Nicholas H. Nelsen, Yunan Yang

Operator learning offers a robust framework for approximating mappings between infinite-dimensional function spaces. It has also become a powerful tool for solving inverse problems in the computational sciences. This chapter surveys methodological and theoretical developments at the intersection of operator learning and inverse problems. It begins by summarizing the probabilistic and deterministic approaches to inverse problems, and pays special attention to emerging measure-centric formulations that treat observed data or unknown parameters as probability distributions. The discussion then turns to operator learning by covering essential components such as data generation, loss functions, and widely used architectures for representing function-to-function maps. The core of the chapter centers on the end-to-end inverse operator learning paradigm, which aims to directly map observed data to the solution of the inverse problem without requiring explicit knowledge of the forward map. It highlights the unique challenge that noise plays in this data-driven inversion setting, presents structure-aware architectures for both point predictions and posterior estimates, and surveys relevant theory for linear and nonlinear inverse problems. The chapter also discusses the estimation of priors and regularizers, where operator learning is used more selectively within classical inversion algorithms.

GEO-PHMay 10, 2017
Application of Optimal Transport and the Quadratic Wasserstein Metric to Full-Waveform Inversion

Yunan Yang, Björn Engquist, Junzhe Sun et al.

Conventional full-waveform inversion (FWI) using the least-squares norm ($L^2$) as a misfit function is known to suffer from cycle skipping. This increases the risk of computing a local rather than the global minimum of the misfit. In our previous work, we proposed the quadratic Wasserstein metric ($W_2$) as a new misfit function for FWI. The $W_2$ metric has been proved to have many ideal properties with regards to convexity and insensitivity to noise. When the observed and predicted seismic data are regarded as two density functions, the quadratic Wasserstein metric corresponds to the optimal cost of rearranging one density into the other, where the transportation cost is quadratic in distance. The difficulty of transforming seismic signals into nonnegative density functions is discussed. Unlike the $L^2$ norm, $W_2$ measures not only amplitude differences, but also global phase shifts, which helps to avoid cycle skipping issues. In this work, we build on our earlier method to cover more realistic high-resolution applications by embedding the $W_2$ technique into the framework of the adjoint-state method and applying it to seismic relevant 2D examples: the Camembert, the Marmousi, and the 2004 BP models. We propose a new way of using the $W_2$ metric trace-by-trace in FWI and compare it to global $W_2$ via the solution of the Monge-Ampère equation. With corresponding adjoint source, the velocity model can be updated using the l-BFGS method. Numerical results show the effectiveness of $W_2$ for alleviating cycle skipping issues and sensitivity to noise. Both mathematical theory and numerical examples demonstrate that the quadratic Wasserstein metric is a good candidate for a misfit function in seismic inversion.

GEO-PHApr 18, 2016
Optimal Transport for Seismic Full Waveform Inversion

Bjorn Engquist, Brittany D. Froese, Yunan Yang

Full waveform inversion is a successful procedure for determining properties of the earth from surface measurements in seismology. This inverse problem is solved by a PDE constrained optimization where unknown coefficients in a computed wavefield are adjusted to minimize the mismatch with the measured data. We propose using the Wasserstein metric, which is related to optimal transport, for measuring this mismatch. Several advantageous properties are proved with regards to convexity of the objective function and robustness with respect to noise. The Wasserstein metric is computed by solving a Monge-Ampere equation. We describe an algorithm for computing its Frechet gradient for use in the optimization. Numerical examples are given.

LGJan 26, 2023
Neural Inverse Operators for Solving PDE Inverse Problems

Roberto Molinaro, Yunan Yang, Björn Engquist et al.

A large class of inverse problems for PDEs are only well-defined as mappings from operators to functions. Existing operator learning frameworks map functions to functions and need to be modified to learn inverse maps from data. We propose a novel architecture termed Neural Inverse Operators (NIOs) to solve these PDE inverse problems. Motivated by the underlying mathematical structure, NIO is based on a suitable composition of DeepONets and FNOs to approximate mappings from operators to functions. A variety of experiments are presented to demonstrate that NIOs significantly outperform baselines and solve PDE inverse problems robustly, accurately and are several orders of magnitude faster than existing direct and PDE-constrained optimization methods.

NAOct 19, 2018
Seismic Inversion and the Data Normalization for Optimal Transport

Björn Engquist, Yunan Yang

Full waveform inversion (FWI) has recently become a favorite technique for the inverse problem of finding properties in the earth from measurements of vibrations of seismic waves on the surface. Mathematically, FWI is PDE constrained optimization where model parameters in a wave equation are adjusted such that the misfit between the computed and the measured dataset is minimized. In a sequence of papers, we have shown that the quadratic Wasserstein distance from optimal transport is to prefer as misfit functional over the standard $L^2$ norm. Datasets need however first to be normalized since seismic signals do not satisfy the requirements of optimal transport. There has been a puzzling contradiction in the results. Normalization methods that satisfy theorems pointing to ideal properties for FWI have not performed well in practical computations, and other scaling methods that do not satisfy these theorems have performed much better in practice. In this paper, we will shed light on this issue and resolve this contradiction.

NAAug 14, 2018
Seismic Imaging and Optimal Transport

Björn Engquist, Yunan Yang

Seismology has been an active science for a long time. It changed character about 50 years ago when the earth's vibrations could be measured on the surface more accurately and more frequently in space and time. The full wave field could be determined, and partial differential equations (PDE) started to be used in the inverse process of finding properties of the interior of the earth. We will briefly review earlier techniques but mainly focus on Full Waveform Inversion (FWI) for the acoustic formulation. FWI is a PDE constrained optimization in which the variable velocity in a forward wave equation is adjusted such that the solution matches measured data on the surface. The minimization of the mismatch is usually coupled with the adjoint state method, which also includes the solution to an adjoint wave equation. The least-squares norm is the conventional objective function measuring the difference between simulated and measured data, but it often results in the minimization trapped in local minima. One way to mitigate this is by selecting another misfit function with better convexity properties. Here we propose using the quadratic Wasserstein metric as a new misfit function in FWI. The optimal map defining the quadratic Wasserstein metric can be computed by solving a Monge-Ampere equation. Theorems pointing to the advantages of using optimal transport over the least-squares norm will be discussed, and a number of large-scale computational examples will be presented.

OCFeb 8, 2023
Adaptive State-Dependent Diffusion for Derivative-Free Optimization

Björn Engquist, Kui Ren, Yunan Yang

This paper develops and analyzes a stochastic derivative-free optimization strategy. A key feature is the state-dependent adaptive variance. We prove global convergence in probability with algebraic rate and give the quantitative results in numerical examples. A striking fact is that convergence is achieved without explicit information of the gradient and even without comparing different objective function values as in established methods such as the simplex method and simulated annealing. It can otherwise be compared to annealing with state-dependent temperature.

OCApr 12, 2022
An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization

Björn Engquist, Kui Ren, Yunan Yang

We propose a new gradient descent algorithm with added stochastic terms for finding the global optimizers of nonconvex optimization problems. A key component in the algorithm is the adaptive tuning of the randomness based on the value of the objective function. In the language of simulated annealing, the temperature is state-dependent. With this, we prove the global convergence of the algorithm with an algebraic rate both in probability and in the parameter space. This is a significant improvement over the classical rate from using a more straightforward control of the noise term. The convergence proof is based on the actual discrete setup of the algorithm, not just its continuous limit as often done in the literature. We also present several numerical examples to demonstrate the efficiency and robustness of the algorithm for reasonably complex objective functions.

MLApr 19, 2023
Generative modeling of time-dependent densities via optimal transport and projection pursuit

Jonah Botvinick-Greenhouse, Yunan Yang, Romit Maulik

Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.

NAFeb 4, 2019
Analysis and Application of Optimal Transport For Challenging Seismic Inverse Problems

Yunan Yang

In seismic exploration, sources and measurements of seismic waves on the surface are used to determine model parameters representing geophysical properties of the earth. Full-waveform inversion (FWI) is a nonlinear seismic inverse technique that inverts the model parameters by minimizing the difference between the synthetic data from the forward wave propagation and the observed true data in PDE-constrained optimization. The traditional least-squares method of measuring this difference suffers from three main drawbacks including local minima trapping, sensitivity to noise, and difficulties in reconstruction below reflecting layers. Unlike the local amplitude comparison in the least-squares method, the quadratic Wasserstein distance from the optimal transport theory considers both the amplitude differences and the phase mismatches when measuring data misfit. We will briefly review our earlier development and analysis of optimal transport-based inversion and include improvements, for example, a stronger convexity proof. The main focus will be on the third "challenge" with new results on sub-reflection recovery.

DSSep 13, 2024
Measure-Theoretic Time-Delay Embedding

Jonah Botvinick-Greenhouse, Maria Oprea, Romit Maulik et al.

The celebrated Takens' embedding theorem provides a theoretical foundation for reconstructing the full state of a dynamical system from partial observations. However, the classical theorem assumes that the underlying system is deterministic and that observations are noise-free, limiting its applicability in real-world scenarios. Motivated by these limitations, we formulate a measure-theoretic generalization that adopts an Eulerian description of the dynamics and recasts the embedding as a pushforward map between spaces of probability measures. Our mathematical results leverage recent advances in optimal transport. Building on the proposed measure-theoretic time-delay embedding theory, we develop a computational procedure that aims to reconstruct the full state of a dynamical system from time-lagged partial observations, engineered with robustness to handle sparse and noisy data. We evaluate our measure-based approach across several numerical examples, ranging from the classic Lorenz-63 system to real-world applications such as NOAA sea surface temperature reconstruction and ERA5 wind field reconstruction.

MLSep 30, 2024
Stochastic Inverse Problem: stability, regularization and Wasserstein gradient flow

Qin Li, Maria Oprea, Li Wang et al.

Inverse problems in physical or biological sciences often involve recovering an unknown parameter that is random. The sought-after quantity is a probability distribution of the unknown parameter, that produces data that aligns with measurements. Consequently, these problems are naturally framed as stochastic inverse problems. In this paper, we explore three aspects of this problem: direct inversion, variational formulation with regularization, and optimization via gradient flows, drawing parallels with deterministic inverse problems. A key difference from the deterministic case is the space in which we operate. Here, we work within probability space rather than Euclidean or Sobolev spaces, making tools from measure transport theory necessary for the study. Our findings reveal that the choice of metric -- both in the design of the loss function and in the optimization process -- significantly impacts the stability and properties of the optimizer.

25.0OCApr 30
Well-Posedness and Efficient Algorithms for Inverse Optimal Transport with Bregman Regularization

Chenglong Bao, Zanyu Li, Yunan Yang

This work analyzes the inverse optimal transport (IOT) problem under Bregman regularization. We establish well-posedness results, including existence, uniqueness (up to equivalence classes of solutions), and stability, under several structural assumptions on the cost matrix. On the computational side, we investigate the existence of solutions to the optimization problem with general constraints on the cost matrix and provide a sufficient condition guaranteeing existence. In addition, we propose an inexact block coordinate descent (BCD) method for the problem with a strongly convex penalty term. In particular, when the penalty is quadratic, the subproblems admit a diagonal Hessian structure, which enables highly efficient element-wise Newton updates. We establish a linear convergence rate for the algorithm and demonstrate its practical performance through numerical experiments, including the validation of stability bounds, the investigation of regularization effects, and the application to a marriage matching dataset.

37.0MLApr 9
On the Unique Recovery of Transport Maps and Vector Fields from Finite Measure-Valued Data

Jonah Botvinick-Greenhouse, Yunan Yang

We establish guarantees for the unique recovery of vector fields and transport maps from finite measure-valued data, yielding new insights into generative models, data-driven dynamical systems, and PDE inverse problems. In particular, we provide general conditions under which a diffeomorphism can be uniquely identified from its pushforward action on finitely many densities, i.e., when the data $\{(ρ_j,f_\#ρ_j)\}_{j=1}^m$ uniquely determines $f$. As a corollary, we introduce a new metric which compares diffeomorphisms by measuring the discrepancy between finitely many pushforward densities in the space of probability measures. We also prove analogous results in an infinitesimal setting, where derivatives of the densities along a smooth vector field are observed, i.e., when $\{(ρ_j,\text{div} (ρ_j v))\}_{j=1}^m$ uniquely determines $v$. Our analysis makes use of the Whitney and Takens embedding theorems, which provide estimates on the required number of densities $m$, depending only on the intrinsic dimension of the problem. We additionally interpret our results through the lens of Perron--Frobenius and Koopman operators and demonstrate how our techniques lead to new guarantees for the well-posedness of certain PDE inverse problems related to continuity, advection, Fokker--Planck, and advection-diffusion-reaction equations. Finally, we present illustrative numerical experiments demonstrating the unique identification of transport maps from finitely many pushforward densities, and of vector fields from finitely many weighted divergence observations.

LGMay 28, 2022
Tuning Frequency Bias in Neural Network Training with Nonuniform Data

Annan Yu, Yunan Yang, Alex Townsend

Small generalization errors of over-parameterized neural networks (NNs) can be partially explained by the frequency biasing phenomenon, where gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals. Using the Neural Tangent Kernel (NTK), one can provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities. Since most training data sets are not drawn from such distributions, we use the NTK model and a data-dependent quadrature rule to theoretically quantify the frequency biasing of NN training given fully nonuniform data. By replacing the loss function with a carefully selected Sobolev norm, we can further amplify, dampen, counterbalance, or reverse the intrinsic frequency biasing in NN training.

22.0NAMar 15
Inference of interacting kernel in the mean-field regime

Peiyi Chen, Qin Li, Li Wang et al.

We study the problem of reconstructing interaction kernels in systems of interacting agents from macroscopic measurements when posed as an optimization problem. The reconstruction procedure depends on the formulation of the forward model, which may be given either by a finite-dimensional coupled ODE system tracking individual agent trajectories or by a mean-field PDE describing the evolution of the agent density. We investigate the similarities and differences between these two formulations in the mean-field regime. While the first variation derived from the particle system does not provide an unbiased estimator of the first variation associated with the limiting PDE, we prove that, under mild assumptions, the two are close in a weak sense with a convergence rate $\mathcal{O}(N^{-1/2})$. This rate is further confirmed by numerical evidences.

LGNov 20, 2024
Sampling with Adaptive Variance for Multimodal Distributions

Björn Engquist, Kui Ren, Yunan Yang

We propose and analyze a class of adaptive sampling algorithms for multimodal distributions on a bounded domain, which share a structural resemblance to the classic overdamped Langevin dynamics. We first demonstrate that this class of linear dynamics with adaptive diffusion coefficients and vector fields can be interpreted and analyzed as weighted Wasserstein gradient flows of the Kullback--Leibler (KL) divergence between the current distribution and the target Gibbs distribution, which directly leads to the exponential convergence of both the KL and $χ^2$ divergences, with rates depending on the weighted Wasserstein metric and the Gibbs potential. We then show that a derivative-free version of the dynamics can be used for sampling without gradient information of the Gibbs potential and that for Gibbs distributions with nonconvex potentials, this approach could achieve significantly faster convergence than the classical overdamped Langevin dynamics. A comparison of the mean transition times between local minima of a nonconvex potential further highlights the better efficiency of the derivative-free dynamics in sampling.

5.8NAMar 21
Adjoint DSMC Method for Spatially Inhomogeneous Boltzmann Equation with General Boundary Conditions

Russel Caflisch, Yunan Yang

This manuscript derives adjoint equations for the numerical solution of the spatially inhomogeneous Boltzmann equation using Direct Simulation Monte Carlo (DSMC). The formulation accounts for spatial transport and a range of boundary conditions, including periodic boundaries, specular reflection, thermal reflection, and prescribed inflow. Numerical experiments are presented to validate the resulting adjoint system. These adjoint formulations are intended for use in gradient-based optimization, sensitivity analysis, and design problems involving rarefied gas dynamics.

MLSep 5, 2025
Cryo-EM as a Stochastic Inverse Problem

Diego Sanchez Espinosa, Erik H Thiede, Yunan Yang

Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules, but structural heterogeneity remains a major challenge in 3D reconstruction. Traditional methods assume a discrete set of conformations, limiting their ability to recover continuous structural variability. In this work, we formulate cryo-EM reconstruction as a stochastic inverse problem (SIP) over probability measures, where the observed images are modeled as the push-forward of an unknown distribution over molecular structures via a random forward operator. We pose the reconstruction problem as the minimization of a variational discrepancy between observed and simulated image distributions, using statistical distances such as the KL divergence and the Maximum Mean Discrepancy. The resulting optimization is performed over the space of probability measures via a Wasserstein gradient flow, which we numerically solve using particles to represent and evolve conformational ensembles. We validate our approach using synthetic examples, including a realistic protein model, which demonstrates its ability to recover continuous distributions over structural states. We analyze the connection between our formulation and Maximum A Posteriori (MAP) approaches, which can be interpreted as instances of the discretize-then-optimize (DTO) framework. We further provide a consistency analysis, establishing conditions under which DTO methods, such as MAP estimation, converge to the solution of the underlying infinite-dimensional continuous problem. Beyond cryo-EM, the framework provides a general methodology for solving SIPs involving random forward operators.

OCAug 23, 2025
HV Metric For Time-Domain Full Waveform Inversion

Matej Neumann, Yunan Yang

Full-waveform inversion (FWI) is a powerful technique for reconstructing high-resolution material parameters from seismic or ultrasound data. The conventional least-squares (\(L^{2}\)) misfit suffers from pronounced non-convexity that leads to \emph{cycle skipping}. Optimal-transport misfits, such as the Wasserstein distance, alleviate this issue; however, their use requires artificially converting the wavefields into probability measures, a preprocessing step that can modify critical amplitude and phase information of time-dependent wave data. We propose the \emph{HV metric}, a transport-based distance that acts naturally on signed signals, as an alternative metric for the \(L^{2}\) and Wasserstein objectives in time-domain FWI. After reviewing the metric's definition and its relationship to optimal transport, we derive closed-form expressions for the Fréchet derivative and Hessian of the map \(f \mapsto d_{\text{HV}}^2(f,g)\), enabling efficient adjoint-state implementations. A spectral analysis of the Hessian shows that, by tuning the hyperparameters \((κ,λ,ε)\), the HV misfit seamlessly interpolates between \(L^{2}\), \(H^{-1}\), and \(H^{-2}\) norms, offering a tunable trade-off between the local point-wise matching and the global transport-based matching. Synthetic experiments on the Marmousi and BP benchmark models demonstrate that the HV metric-based objective function yields faster convergence and superior tolerance to poor initial models compared to both \(L^{2}\) and Wasserstein misfits. These results demonstrate the HV metric as a robust, geometry-preserving alternative for large-scale waveform inversion.

LGMay 27, 2025
Learning where to learn: Training data distribution optimization for scientific machine learning

Nicolas Guerra, Nicholas H. Nelsen, Yunan Yang

In scientific machine learning, models are routinely deployed with parameter values or boundary conditions far from those used in training. This paper studies the learning-where-to-learn problem of designing a training data distribution that minimizes average prediction error across a family of deployment regimes. A theoretical analysis shows how the training distribution shapes deployment accuracy. This motivates two adaptive algorithms based on bilevel or alternating optimization in the space of probability measures. Discretized implementations using parametric distribution classes or nonparametric particle-based gradient flows deliver optimized training distributions that outperform nonadaptive designs. Once trained, the resulting models exhibit improved sample complexity and robustness to distribution shift. This framework unlocks the potential of principled data acquisition for learning functions and solution operators of partial differential equations.

DSNov 30, 2024
Invariant Measures in Time-Delay Coordinates for Unique Dynamical System Identification

Jonah Botvinick-Greenhouse, Robert Martin, Yunan Yang

While invariant measures are widely employed to analyze physical systems when a direct study of pointwise trajectories is intractable, e.g., due to chaos or noise, they cannot uniquely identify the underlying dynamics. Our first result shows that, in contrast to invariant measures in state coordinates, e.g., $[x(t), y(t), z(t)]$, the invariant measure expressed in time-delay coordinates, e.g., $[x(t), x(t-τ),\ldots, x(t-(m-1)τ)]$, can identify the dynamics up to a topological conjugacy. Our second result resolves the remaining ambiguity: by combining invariant measures constructed from multiple delay frames with distinct observables, the system is uniquely identifiable, provided that a suitable initial condition is satisfied. These guarantees require informative observables and appropriate delay parameters ($m,τ$), which can be limiting in certain settings. We support our theoretical contributions through a series of physical examples demonstrating how invariant measures expressed in delay-coordinates can be used to perform robust system identification in practice.

OCFeb 13, 2022
Efficient Natural Gradient Descent Methods for Large-Scale PDE-Based Optimization Problems

Levon Nurbekyan, Wanzhou Lei, Yunan Yang

We propose efficient numerical schemes for implementing the natural gradient descent (NGD) for a broad range of metric spaces with applications to PDE-based optimization problems. Our technique represents the natural gradient direction as a solution to a standard least-squares problem. Hence, instead of calculating, storing, or inverting the information matrix directly, we apply efficient methods from numerical linear algebra. We treat both scenarios where the Jacobian, i.e., the derivative of the state variable with respect to the parameter, is either explicitly known or implicitly given through constraints. We can thus reliably compute several natural NGDs for a large-scale parameter space. In particular, we are able to compute Wasserstein NGD in thousands of dimensions, which was believed to be out of reach. Finally, our numerical results shed light on the qualitative differences between the standard gradient descent and various NGD methods based on different metric spaces in nonconvex optimization problems.

LGJan 23, 2022
A Generalized Weighted Optimization Method for Computational Learning and Inversion

Björn Engquist, Kui Ren, Yunan Yang

The generalization capacity of various machine learning models exhibits different phenomena in the under- and over-parameterized regimes. In this paper, we focus on regression models such as feature regression and kernel regression and analyze a generalized weighted least-squares optimization method for computational learning and inversion with noisy data. The highlight of the proposed framework is that we allow weighting in both the parameter space and the data space. The weighting scheme encodes both a priori knowledge on the object to be learned and a strategy to weight the contribution of different data points in the loss function. Here, we characterize the impact of the weighting scheme on the generalization error of the learning method, where we derive explicit generalization errors for the random Fourier feature model in both the under- and over-parameterized regimes. For more general feature maps, error bounds are provided based on the singular values of the feature matrix. We demonstrate that appropriate weighting from prior knowledge can improve the generalization capability of the learned model.

NANov 15, 2019
The quadratic Wasserstein metric for inverse data matching

Bjorn Engquist, Kui Ren, Yunan Yang

This work characterizes, analytically and numerically, two major effects of the quadratic Wasserstein ($W_2$) distance as the measure of data discrepancy in computational solutions of inverse problems. First, we show, in the infinite-dimensional setup, that the $W_2$ distance has a smoothing effect on the inversion process, making it robust against high-frequency noise in the data but leading to a reduced resolution for the reconstructed objects at a given noise level. Second, we demonstrate that for some finite-dimensional problems, the $W_2$ distance leads to optimization problems that have better convexity than the classical $L^2$ and $H^{-1}$ distances, making it a more preferred distance to use when solving such inverse matching problems.