Maarten V. de Hoop

LG
h-index43
39papers
557citations
Novelty57%
AI Score59

39 Papers

54.9LGJun 1
Flowers: A Warp Drive for Neural PDE Solvers

Till Muser, Alexandra Spitzer, Matti Lassas et al.

We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters only through sparse sampling at source coordinates, one per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost. We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.

98.3NAMay 21
Dimension-Free Multimodal Sampling via Preconditioned Annealed Langevin Dynamics

Lorenzo Baldassari, Josselin Garnier, Knut Solna et al.

Designing sampling algorithms for multimodal targets that remain stable under refinement of the finite-dimensional approximation of an underlying function-space problem is a central challenge. Annealed Langevin dynamics (ALD) is a natural alternative to classical Langevin in this context, since it is often observed to improve exploration across modes. Yet a gap remains between its empirical success and existing theory: under which conditions can ALD be guaranteed to remain stable across dimensions? In this paper, we bridge this gap by providing a uniform-in-dimension analysis of continuous-time ALD for Gaussian-mixture targets. Along an explicit annealing path obtained by gradually removing Gaussian smoothing from the target, we identify spectral conditions linking the smoothing covariance to the component covariances under which ALD achieves a prescribed accuracy in Kullback-Leibler divergence within a dimension-uniform time horizon. We then establish stability in a perturbative regime with imperfect initialization and approximate scores. Under a misspecified-mixture score model, we show that preconditioning ALD with an operator whose spectrum decays sufficiently fast prevents error terms from accumulating across coordinates and thereby preserves dimension-uniform control.

LGJan 27, 2023
Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation

J. Antonio Lara Benitez, Takashi Furuya, Florian Faucher et al. · eth-zurich

Despite their remarkable success in approximating a wide range of operators defined by PDEs, existing neural operators (NOs) do not necessarily perform well for all physics problems. We focus here on high-frequency waves to highlight possible shortcomings. To resolve these, we propose a subfamily of NOs enabling an enhanced empirical approximation of the nonlinear operator mapping wave speed to solution, or boundary values for the Helmholtz equation on a bounded domain. The latter operator is commonly referred to as the ''forward'' operator in the study of inverse problems. Our methodology draws inspiration from transformers and techniques such as stochastic depth. Our experiments reveal certain surprises in the generalization and the relevance of introducing stochastic depth. Our NOs show superior performance as compared with standard NOs, not only for testing within the training distribution but also for out-of-distribution scenarios. To delve into this observation, we offer an in-depth analysis of the Rademacher complexity associated with our modified models and prove an upper bound tied to their stochastic depth that existing NOs do not satisfy. Furthermore, we obtain a novel out-of-distribution risk bound tailored to Gaussian measures on Banach spaces, again relating stochastic depth with the bound. We conclude by proposing a hypernetwork version of the subfamily of NOs as a surrogate model for the mentioned forward operator.

FAJan 18, 2012
Local Analysis of Inverse Problems: Hölder Stability and Iterative Reconstruction

Maarten V. de Hoop, Lingyun Qiu, Otmar Scherzer

We consider a class of inverse problems defined by a nonlinear map from parameter or model functions to the data. We assume that solutions exist. The space of model functions is a Banach space which is smooth and uniformly convex; however, the data space can be an arbitrary Banach space. We study sequences of parameter functions generated by a nonlinear Landweber iteration and conditions under which these strongly converge, locally, to the solutions within an appropriate distance. We express the conditions for convergence in terms of Hölder stability of the inverse maps, which ties naturally to the analysis of inverse problems.

NAMar 16, 2013
Multi-scale discrete approximations of Fourier integral operators associated with canonical transformations and caustics

Maarten V. de Hoop, Gunther Uhlmann, Andras Vasy et al.

We develop an algorithm for the computation of general Fourier integral operators associated with canonical graphs. The algorithm is based on dyadic parabolic decomposition using wave packets and enables the discrete approximate evaluation of the action of such operators on data in the presence of caustics. The procedure consists in the construction of a universal operator representation through the introduction of locally singularity-resolving diffeomorphisms, enabling the application of wave packet driven computation, and in the construction of the associated pseudo-differential joint-partition of unity on the canonical graphs. We apply the method to a parametrix of the wave equation in the vicinity of a cusp singularity.

FANov 10, 2015
Optimal Convergence Rates Results for Linear Inverse Problems in Hilbert Spaces

Vinicius Albani, Peter Elbau, Maarten V. de Hoop et al.

In this paper, we prove optimal convergence rates results for regularisation methods for solving linear ill-posed operator equations in Hilbert spaces. The result generalises existing convergence rates results on optimality to general source conditions, such as logarithmic source conditions. Moreover, we also provide optimality results under variational source conditions and show the connection to approximative source conditions.

NANov 29, 2018
Solving the 3D High-Frequency Helmholtz Equation using Contour Integration and Polynomial Preconditioning

Xiao Liu, Yuanzhe Xi, Yousef Saad et al.

We propose an iterative solution method for the 3D high-frequency Helmholtz equation that exploits a contour integral formulation of spectral projectors. In this framework, the solution in certain invariant subspaces is approximated by solving complex-shifted linear systems, resulting in faster GMRES iterations due to the restricted spectrum. The shifted systems are solved by exploiting a polynomial fixed-point iteration, which is a robust scheme even if the magnitude of the shift is small. Numerical tests in 3D indicate that $O(n^{1/3})$ matrix-vector products are needed to solve a high-frequency problem with a matrix size $n$ with high accuracy. The method has a small storage requirement, can be applied to both dense and sparse linear systems, and is highly parallelizable.

FASep 26, 2014
Generalized Convergence Rates Results for Linear Inverse Problems in Hilbert Spaces

Roman Andreev, Peter Elbau, Maarten V. de Hoop et al.

In recent years, a series of convergence rates conditions for regularization methods has been developed. Mainly, the motivations for developing novel conditions came from the desire to carry over convergence rates results from the Hilbert space setting to generalized Tikhonov regularization in Banach spaces. For instance, variational source conditions have been developed and they were expected to be equivalent to standard source conditions for linear inverse problems in a Hilbert space setting. We show that this expectation does not hold. However, in the standard Hilbert space setting these novel conditions are optimal, which we prove by using some deep results from Neubauer, and generalize existing convergence rates results. The key tool in our analysis is a novel source condition, which we put into relation to the existing source conditions from the literature. As a positive by-product, convergence rates results can be proven without spectral theory, which is the standard technique for proving convergence rates for linear inverse problems in Hilbert spaces.

NAJun 16, 2012
A convergence analysis of a multi-level projected steepest descent iteration for nonlinear inverse problems in Banach spaces subject to stability constraints

Maarten V. de Hoop, Lingyun Qiu, Otmar Scherzer

We consider nonlinear inverse problems described by operator equations in Banach spaces. Assuming conditional stability of the inverse problem, that is, assuming that stability holds on a closed, convex subset of the domain of the operator, we introduce a novel nonlinear projected steepest descent iteration and analyze its convergence to an approximate solution given limited accuracy data. We proceed with developing a multi-level algorithm based on a nested family of closed, convex subsets on which stability holds and the stability constants are ordered. Growth of the stability constants is coupled to the increase in accuracy of approximation between neighboring levels to ensure that the algorithm can continue from level to level until the iterate satisfies a desired discrepancy criterion, after a finite number of steps.

LGJun 6, 2023
Globally injective and bijective neural operators

Takashi Furuya, Michael Puthawala, Matti Lassas et al.

Recently there has been great interest in operator learning, where networks learn operators between function spaces from an essentially infinite-dimensional perspective. In this work we present results for when the operators learned by these networks are injective and surjective. As a warmup, we combine prior work in both the finite-dimensional ReLU and operator learning setting by giving sharp conditions under which ReLU layers with linear neural operators are injective. We then consider the case the case when the activation function is pointwise bijective and obtain sufficient conditions for the layer to be injective. We remark that this question, while trivial in the finite-rank case, is subtler in the infinite-rank case and is proved using tools from Fredholm theory. Next, we prove that our supplied injective neural operators are universal approximators and that their implementation, with finite-rank neural networks, are still injective. This ensures that injectivity is not `lost' in the transcription from analytical operators to their finite-rank implementation with networks. Finally, we conclude with an increase in abstraction and consider general conditions when subnetworks, which may be many layers deep, are injective and surjective and provide an exact inversion from a `linearization.' This section uses general arguments from Fredholm theory and Leray-Schauder degree theory for non-linear integral equations to analyze the mapping properties of neural operators in function spaces. These results apply to subnetworks formed from the layers considered in this work, under natural conditions. We believe that our work has applications in Bayesian UQ where injectivity enables likelihood estimation and in inverse problems where surjectivity and injectivity corresponds to existence and uniqueness, respectively.

NAMay 24, 2019
A weight-adjusted discontinuous Galerkin method for the poroelastic wave equation: penalty fluxes and micro-heterogeneities

Khemraj Shukla, Jesse Chan, Maarten V. de Hoop et al.

We introduce a high-order weight-adjusted discontinuous Galerkin (WADG) scheme for the numerical solution of three-dimensional (3D) wave propagation problems in anisotropic porous media. We use a coupled first-order symmetric stress-velocity formulation. Careful attention is directed at (a) the derivation of an energy-stable penalty-based numerical flux, which offers high-order accuracy in presence of material discontinuities, and (b) proper treatment of micro-heterogeneities (sub-element variations) in the numerical scheme. The use of a penalty-based numerical flux avoids the diagonalization of Jacobian matrices into polarized wave constituents necessary when solving element-wise Riemann problems. Micro-heterogeneities are accurately and stably incorporated in the numerical scheme using easily-invertible weight-adjusted mass matrices. The convergence of the proposed numerical scheme is proven and verified by using convergence studies against analytical plane wave solutions. The proposed method is also compared against an existing implementation using the spectral element method to solve the poroelastic wave equation.

LGApr 24, 2023
An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning

Anastasis Kratsios, Chong Liu, Matti Lassas et al.

Motivated by the developing mathematics of deep learning, we build universal functions approximators of continuous maps between arbitrary Polish metric spaces $\mathcal{X}$ and $\mathcal{Y}$ using elementary functions between Euclidean spaces as building blocks. Earlier results assume that the target space $\mathcal{Y}$ is a topological vector space. We overcome this limitation by ``randomization'': our approximators output discrete probability measures over $\mathcal{Y}$. When $\mathcal{X}$ and $\mathcal{Y}$ are Polish without additional structure, we prove very general qualitative guarantees; when they have suitable combinatorial structure, we prove quantitative guarantees for Hölder-like maps, including maps between finite graphs, solution operators to rough differential equations between certain Carnot groups, and continuous non-linear operators between Banach spaces arising in inverse problems. In particular, we show that the required number of Dirac measures is determined by the combinatorial structure of $\mathcal{X}$ and $\mathcal{Y}$. For barycentric $\mathcal{Y}$, including Banach spaces, $\mathbb{R}$-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric spaces, our approximators reduce to $\mathcal{Y}$-valued functions. When the Euclidean approximators are neural networks, our constructions generalize transformer networks, providing a new probabilistic viewpoint of geometric deep learning.

LGJan 27, 2023
Unearthing InSights into Mars: Unsupervised Source Separation with Limited Data

Ali Siahkoohi, Rudy Morel, Maarten V. de Hoop et al.

Source separation involves the ill-posed problem of retrieving a set of source signals that have been observed through a mixing operator. Solving this problem requires prior knowledge, which is commonly incorporated by imposing regularity conditions on the source signals, or implicitly learned through supervised or unsupervised methods from existing data. While data-driven methods have shown great promise in source separation, they often require large amounts of data, which rarely exists in planetary space missions. To address this challenge, we propose an unsupervised source separation scheme for domains with limited data access that involves solving an optimization problem in the wavelet scattering covariance representation space$\unicode{x2014}$an interpretable, low-dimensional representation of stationary processes. We present a real-data example in which we remove transient, thermally-induced microtilts$\unicode{x2014}$known as glitches$\unicode{x2014}$from data recorded by a seismometer during NASA's InSight mission on Mars. Thanks to the wavelet scattering covariances' ability to capture non-Gaussian properties of stochastic processes, we are able to separate glitches using only a few glitch-free data snippets.

SPOct 1, 2023
Implicit Neural Representations and the Algebra of Complex Wavelets

T. Mitchell Roddenberry, Vishwanath Saragadam, Maarten V. de Hoop et al.

Implicit neural representations (INRs) have arisen as useful methods for representing signals on Euclidean domains. By parameterizing an image as a multilayer perceptron (MLP) on Euclidean space, INRs effectively represent signals in a way that couples spatial and spectral features of the signal that is not obvious in the usual discrete representation, paving the way for continuous signal processing and machine learning approaches that were not previously possible. Although INRs using sinusoidal activation functions have been studied in terms of Fourier theory, recent works have shown the advantage of using wavelets instead of sinusoids as activation functions, due to their ability to simultaneously localize in both frequency and space. In this work, we approach such INRs and demonstrate how they resolve high-frequency features of signals from coarse approximations done in the first layer of the MLP. This leads to multiple prescriptions for the design of INR architectures, including the use of complex wavelets, decoupling of low and band-pass approximations, and initialization schemes based on the singularities of the desired signal.

CLAug 2, 2024
Transformers are Universal In-context Learners

Takashi Furuya, Maarten V. de Hoop, Gabriel Peyré

Transformers are deep architectures that define "in-context mappings" which enable predicting new tokens based on a given set of tokens (such as a prompt in NLP applications or a set of patches for a vision transformer). In this work, we study in particular the ability of these architectures to handle an arbitrarily large number of context tokens. To mathematically, uniformly address their expressivity, we consider the case that the mappings are conditioned on a context represented by a probability distribution of tokens which becomes discrete for a finite number of these. The relevant notion of smoothness then corresponds to continuity in terms of the Wasserstein distance between these contexts. We demonstrate that deep transformers are universal and can approximate continuous in-context mappings to arbitrary precision, uniformly over compact token domains. A key aspect of our results, compared to existing findings, is that for a fixed precision, a single transformer can operate on an arbitrary (even infinite) number of tokens. Additionally, it operates with a fixed embedding dimension of tokens (this dimension does not increase with precision) and a fixed number of heads (proportional to the dimension). The use of MLPs between multi-head attention layers is also explicitly controlled. We consider both unmasked attentions (as used for the vision transformer) and masked causal attentions (as used for NLP and time series applications). We tackle the causal setting leveraging a space-time lifting to analyze causal attention as a mapping over probability distributions of tokens.

GEO-PHOct 21, 2024Code
SeisLM: a Foundation Model for Seismic Waveforms

Tianlin Liu, Jannes Münchmeyer, Laura Laurenti et al.

We introduce the Seismic Language Model (SeisLM), a foundational model designed to analyze seismic waveforms -- signals generated by Earth's vibrations such as the ones originating from earthquakes. SeisLM is pretrained on a large collection of open-source seismic datasets using a self-supervised contrastive loss, akin to BERT in language modeling. This approach allows the model to learn general seismic waveform patterns from unlabeled data without being tied to specific downstream tasks. When fine-tuned, SeisLM excels in seismological tasks like event detection, phase-picking, onset time regression, and foreshock-aftershock classification. The code has been made publicly available on https://github.com/liutianlin0121/seisLM.

73.0LGMay 18
Function graph transformers universally approximate operators between function spaces

Takashi Furuya, David Mis, Ivan Dokmanić et al.

We study the approximation of nonlinear operators between function spaces by transformers. Our approach is to lift functions to measures supported on their graphs and leverage a recently introduced measure-theoretic view of transformers. A function $h$ is represented by its graph measure $γ_h$, with finite tokens $\{(x_j,h(x_j))\}_{j=1}^N$ being its empirical approximations. We show that this framework elegantly models discretization refinement via convergence of measures and provides a natural setting for operator learning. Within this framework, we introduce function graph transformers, a graph-preserving subclass of measure-theoretic transformers that maps graph measures to graph measures, which is to say that outputs remain single-valued functions. Crucially, this additional structure does not reduce generality: we prove that the resulting graph-preserving maps can be approximated by finite compositions of standard softmax self-attention layers and pointwise MLPs, yielding universal approximation results for broad classes of nonlinear operators. Unlike existing theoretical approaches to operator learning with transformers, the measure-theoretic framework also accommodates regularized negative-order Sobolev inputs for which discretization invariance is particularly challenging, as well as query points on different output domains. Overall, function graph transformers provide a continuum viewpoint and mathematical toolkit for transformer-based operator learning, clarifying the roles of positional encodings, graph structure, regularization, and ensuring consistency across discretizations.

LGDec 10, 2025
Rates and architectures for learning geometrically non-trivial operators

T. Mitchell Roddenberry, Leo Tzou, Ivan Dokmanić et al.

Deep learning methods have proven capable of recovering operators between high-dimensional spaces, such as solution maps of PDEs and similar objects in mathematical physics, from very few training samples. This phenomenon of data-efficiency has been proven for certain classes of elliptic operators with simple geometry, i.e., operators that do not change the domain of the function or propagate singularities. However, scientific machine learning is commonly used for problems that do involve the propagation of singularities in a priori unknown ways, such as waves, advection, and fluid dynamics. In light of this, we expand the learning theory to include double fibration transforms--geometric integral operators that include generalized Radon and geodesic ray transforms. We prove that this class of operators does not suffer from the curse of dimensionality: the error decays superalgebraically, that is, faster than any fixed power of the reciprocal of the number of training samples. Furthermore, we investigate architectures that explicitly encode the geometry of these transforms, demonstrating that an architecture reminiscent of cross-attention based on levelset methods yields a parameterization that is universal, stable, and learns double fibration transforms from very few training examples. Our results contribute to a rapidly-growing line of theoretical work on learning operators for scientific machine learning.

96.3OCMay 17
Training Infinitely Deep and Wide Transformers

Raphaël Barboni, Maarten V. de Hoop, Takashi Furuya et al.

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.

GEO-PHJul 14, 2023
High-Rate Phase Association with Travel Time Neural Fields

Cheng Shi, Giulio Poggiali, Chris Marone et al.

Earthquake science and seismology rely on the ability to associate seismic waves with their originating earthquakes. Earthquake detection algorithms based on deep learning have progressed rapidly and now routinely detect microearthquakes with unprecedented clarity, providing information about fault dynamics on increasingly finer spatiotemporal scales. However, this densification of detections can overwhelm existing techniques for phase association which rely on fixed wave speed models and associate events one by one. These methods fail when the event rates become high or where the 4D complexity of elastic wave speeds cannot be ignored. Here, we introduce HARPA, a deep learning solution to this problem. HARPA is a high-rate association framework which incorporates wave physics by leveraging deep generative models and travel time neural fields. Instead of associating events one by one, it lifts arrival sequences to probability distributions and compares them using an optimal transport metric. The generative travel time neural fields are used to estimate the wave speed simultaneously with association. HARPA outperforms state-of-the-art association methods for both real seismic data and complex synthetic models and paves the way for improved understanding of seismicity while establishing a new seismic data analysis paradigm.

74.3MLMar 20
Forward and inverse problems for measure flows in Bayes Hilbert spaces

S. David Mis, Maarten V. de Hoop

We study forward and inverse problems for time-dependent probability measures in Bayes--Hilbert spaces. On the forward side, we show that each sufficiently regular Bayes--Hilbert path admits a canonical dynamical realization: a weighted Neumann problem transforms the log-density variation into the unique gradient velocity field of minimum kinetic energy. This construction induces a transport form on Bayes--Hilbert tangent directions, which measures the dynamical cost of realizing prescribed motions, and yields a flow-matching interpretation in which the canonical velocity field is the minimum-energy execution of the prescribed path. On the inverse side, we formulate reconstruction directly on Bayes--Hilbert path space from time-dependent indirect observations. The resulting variational problem combines a data-misfit term with the transport action induced by the forward geometry. In our infinite-dimensional setting, however, this transport geometry alone does not provide sufficient compactness, so we add explicit temporal and spatial regularization to close the theory. The linearized observation operator induces a complementary observability form, which quantifies how strongly tangent directions are seen through the data. Under explicit Sobolev regularity and observability assumptions, we prove existence of minimizers, derive first-variation formulas, establish local stability of the observation map, and deduce recovery of the evolving law, its score, and its canonical velocity field under the strong topologies furnished by the compactness theory.

70.9MLMay 15
Dimension-Uniform Discretization Analysis of Preconditioned Annealed Langevin Dynamics for Multimodal Gaussian Mixtures

Lorenzo Baldassari, Josselin Garnier, Knut Solna et al.

Obtaining stable diffusion-based samplers in high- and infinite-dimensional settings is challenging because errors can accumulate across high-frequency coordinates and make the dynamics unstable under refinement of the finite-dimensional approximation of the underlying function-space problem. Discretization is a typical source of such errors, and preconditioning with a suitable spectral decay is one way to control their accumulation. In this paper, we study this problem for preconditioned annealed Langevin dynamics (ALD) applied to Gaussian mixtures. We first show that Euler-Maruyama (EM) discretization, by treating the stiff linear part of the annealed score with a forward Euler step, imposes a stability constraint coupling the preconditioner with the annealed covariance scale. Together with the conditions ensuring dimension-uniform control of the annealed dynamics, this constraint forces the initial smoothed law to remain uniformly close to the target across dimensions. We then consider an exponential-integrator scheme that integrates the stiff linear part of the annealed score exactly. Under explicit spectral summability conditions coupling the smoothing covariance, the component covariance spectra, and the preconditioner, we prove a dimension-uniform Kullback-Leibler (KL) bound for this scheme. This bound can be made arbitrarily small, uniformly in dimension, by allowing enough time for annealing and then refining the time mesh accordingly. Importantly, these conditions allow regimes in which the KL divergence between the target and the initial smoothed law diverges with dimension, showing that the restrictions imposed by EM are scheme-dependent rather than intrinsic to ALD.

SPJan 30
Hybrid operator learning of wave scattering maps in high-contrast media

Advait Balaji, Trevor Teolis, S. David Mis et al.

Surrogate modeling of wave propagation and scattering (i.e. the wave speed and source to wave field map) in heterogeneous media has significant potential in applications such as seismic imaging and inversion. High-contrast settings, such as subsurface models with salt bodies, exhibit strong scattering and phase sensitivity that challenge existing neural operators. We propose a hybrid architecture that decomposes the scattering operator into two separate contributions: a smooth background propagation and a high-contrast scattering correction. The smooth component is learned with a Fourier Neural Operator (FNO), which produces globally coupled feature tokens encoding background wave propagation; these tokens are then passed to a vision transformer, where attention is used to model the high-contrast scattering correction dominated by strong, spatial interactions. Evaluated on high-frequency Helmholtz problems with strong contrasts, the hybrid model achieves substantially improved phase and amplitude accuracy compared to standalone FNOs or transformers, with favorable accuracy-parameter scaling.

51.9STMay 8
On Observation Time for Recovering Latent Hawkes Networks

Jonas Linkerhägner, Michele Bortolasi, Lorenzo Baldassari et al.

Dynamics of interacting systems in engineering, society, and nature often evolve over latent networks that govern which entities can interact. We study the problem of inferring these networks from event-based observations, which arise naturally in finance, seismology, and neuroscience. While there is substantial algorithmic work addressing this important problem, theoretical results are scarce. In this paper we ask the following fundamental question: what is the minimum time that one must observe the dynamics in order to exactly recover the underlying network, as a function of the number $d$ of interacting entities? For a class of stationary Hawkes processes with sparse, weak interactions, we prove that an observation time of order $\log d$ is sufficient and necessary. For the upper bound we construct a two-stage estimator that uses clipped and binned event data for screening, followed by a least-squares refinement, and apply concentration bounds derived from the Poisson cluster representation. For the lower bound we combine Fano's inequality with Jacod's Girsanov formula for point processes on a suitable subclass of networks.

LGJan 12, 2025
Neural equilibria for long-term prediction of nonlinear conservation laws

J. Antonio Lara Benitez, Junyi Guo, Kareem Hegazy et al.

We introduce Neural Discrete Equilibrium (NeurDE), a machine learning (ML) approach for long-term forecasting of flow phenomena that relies on a "lifting" of physical conservation laws into the framework of kinetic theory. The kinetic formulation provides an excellent structure for ML algorithms by separating nonlinear, non-local physics into a nonlinear but local relaxation to equilibrium and a linear non-local transport. This separation allows the ML to focus on the local nonlinear components while addressing the simpler linear transport with efficient classical numerical algorithms. To accomplish this, we design an operator network that maps macroscopic observables to equilibrium states in a manner that maximizes entropy, yielding expressive BGK-type collisions. By incorporating our surrogate equilibrium into the lattice Boltzmann (LB) algorithm, we achieve accurate flow forecasts for a wide range of challenging flows. We show that NeurDE enables accurate prediction of compressible flows, including supersonic flows, while tracking shocks over hundreds of time steps, using a small velocity lattice-a heretofore unattainable feat without expensive numerical root finding.

MLMay 24, 2024
Taming Score-Based Diffusion Priors for Infinite-Dimensional Nonlinear Inverse Problems

Lorenzo Baldassari, Ali Siahkoohi, Josselin Garnier et al.

This work introduces a sampling method capable of solving Bayesian inverse problems in function space. It does not assume the log-concavity of the likelihood, meaning that it is compatible with nonlinear inverse problems. The method leverages the recently defined infinite-dimensional score-based diffusion models as a learning-based prior, while enabling provable posterior sampling through a Langevin-type MCMC algorithm defined on function spaces. A novel convergence analysis is conducted, inspired by the fixed-point methods established for traditional regularization-by-denoising algorithms and compatible with weighted annealing. The obtained convergence bound explicitly depends on the approximation error of the score; a well-approximated score is essential to obtain a well-approximated posterior. Stylized and PDE-based examples are provided, demonstrating the validity of our convergence analysis. We conclude by presenting a discussion of the method's challenges related to learning the score and computational complexity.

LGDec 4, 2024
Can neural operators always be continuously discretized?

Takashi Furuya, Michael Puthawala, Maarten V. de Hoop et al.

We consider the problem of discretization of neural operators between Hilbert spaces in a general framework including skip connections. We focus on bijective neural operators through the lens of diffeomorphisms in infinite dimensions. Framed using category theory, we give a no-go theorem that shows that diffeomorphisms between Hilbert spaces or Hilbert manifolds may not admit any continuous approximations by diffeomorphisms on finite-dimensional spaces, even if the approximations are nonlinear. The natural way out is the introduction of strongly monotone diffeomorphisms and layerwise strongly monotone neural operators which have continuous approximations by strongly monotone diffeomorphisms on finite-dimensional spaces. For these, one can guarantee discretization invariance, while ensuring that finite-dimensional approximations converge not only as sequences of functions, but that their representations converge in a suitable sense as well. Finally, we show that bilipschitz neural operators may always be written in the form of an alternating composition of strongly monotone neural operators, plus a simple isometry. Thus we realize a rigorous platform for discretization of a generalization of a neural operator. We also show that neural operators of this type may be approximated through the composition of finite-rank residual neural operators, where each block is strongly monotone, and may be inverted locally via iteration. We conclude by providing a quantitative approximation result for the discretization of general bilipschitz neural operators.

NANov 25, 2025
Extension and neural operator approximation of the electrical impedance tomography inverse map

Maarten V. de Hoop, Nikola B. Kovachki, Matti Lassas et al.

This paper considers the problem of noise-robust neural operator approximation for the solution map of Calderón's inverse conductivity problem. In this continuum model of electrical impedance tomography (EIT), the boundary measurements are realized as a noisy perturbation of the Neumann-to-Dirichlet map's integral kernel. The theoretical analysis proceeds by extending the domain of the inversion operator to a Hilbert space of kernel functions. The resulting extension shares the same stability properties as the original inverse map from kernels to conductivities, but is now amenable to neural operator approximation. Numerical experiments demonstrate that Fourier neural operators excel at reconstructing infinite-dimensional piecewise constant and lognormal conductivities in noisy setups both within and beyond the theory's assumptions. The methodology developed in this paper for EIT exemplifies a broader strategy for addressing nonlinear inverse problems with a noise-aware operator learning framework.

CLSep 30, 2025
Transformers through the lens of support-preserving maps between measures

Takashi Furuya, Maarten V. de Hoop, Matti Lassas

Transformers are deep architectures that define ``in-context maps'' which enable predicting new tokens based on a given set of tokens (such as a prompt in NLP applications or a set of patches for a vision transformer). In previous work, we studied the ability of these architectures to handle an arbitrarily large number of context tokens. To mathematically, uniformly analyze their expressivity, we considered the case that the mappings are conditioned on a context represented by a probability distribution which becomes discrete for a finite number of tokens. Modeling neural networks as maps on probability measures has multiple applications, such as studying Wasserstein regularity, proving generalization bounds and doing a mean-field limit analysis of the dynamics of interacting particles as they go through the network. In this work, we study the question what kind of maps between measures are transformers. We fully characterize the properties of maps between measures that enable these to be represented in terms of in-context maps via a push forward. On the one hand, these include transformers; on the other hand, transformers universally approximate representations with any continuous in-context map. These properties are preserving the cardinality of support and that the regular part of their Fréchet derivative is uniformly continuous. Moreover, we show that the solution map of the Vlasov equation, which is of nonlocal transport type, for interacting particle systems in the mean-field regime for the Cauchy problem satisfies the conditions on the one hand and, hence, can be approximated by a transformer; on the other hand, we prove that the measure-theoretic self-attention has the properties that ensure that the infinite depth, mean-field measure-theoretic transformer can be identified with a Vlasov flow.

MLMay 23, 2025
Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems

Lorenzo Baldassari, Josselin Garnier, Knut Solna et al.

Designing algorithms for solving high-dimensional Bayesian inverse problems directly in infinite-dimensional function spaces - where such problems are naturally formulated - is crucial to ensure stability and convergence as the discretization of the underlying problem is refined. In this paper, we contribute to this line of work by analyzing a widely used sampler for linear inverse problems: Langevin dynamics driven by score-based generative models (SGMs) acting as priors, formulated directly in function space. Building on the theoretical framework for SGMs in Hilbert spaces, we give a rigorous definition of this sampler in the infinite-dimensional setting and derive, for the first time, error estimates that explicitly depend on the approximation error of the score. As a consequence, we obtain sufficient conditions for global convergence in Kullback-Leibler divergence on the underlying function space. Preventing numerical instabilities requires preconditioning of the Langevin algorithm and we prove the existence and the form of an optimal preconditioner. The preconditioner depends on both the score error and the forward operator and guarantees a uniform convergence rate across all posterior modes. Our analysis applies to both Gaussian and a general class of non-Gaussian priors. Finally, we present examples that illustrate and validate our theoretical findings.

LGJan 2, 2025
Semialgebraic Neural Networks: From roots to representations

S. David Mis, Matti Lassas, Maarten V. de Hoop

Many numerical algorithms in scientific computing -- particularly in areas like numerical linear algebra, PDE simulation, and inverse problems -- produce outputs that can be represented by semialgebraic functions; that is, the graph of the computed function can be described by finitely many polynomial equalities and inequalities. In this work, we introduce Semialgebraic Neural Networks (SANNs), a neural network architecture capable of representing any bounded semialgebraic function, and computing such functions up to the accuracy of a numerical ODE solver chosen by the programmer. Conceptually, we encode the graph of the learned function as the kernel of a piecewise polynomial selected from a class of functions whose roots can be evaluated using a particular homotopy continuation method. We show by construction that the SANN architecture is able to execute this continuation method, thus evaluating the learned semialgebraic function. Furthermore, the architecture can exactly represent even discontinuous semialgebraic functions by executing a continuation method on each connected component of the target function. Lastly, we provide example applications of these networks and show they can be trained with traditional deep-learning techniques.

MLMay 24, 2024
An Unconditional Representation of the Conditional Score in Infinite-Dimensional Linear Inverse Problems

Fabian Schneider, Duc-Lam Duong, Matti Lassas et al.

Score-based diffusion models (SDMs) have emerged as a powerful tool for sampling from the posterior distribution in Bayesian inverse problems. However, existing methods often require multiple evaluations of the forward mapping to generate a single sample, resulting in significant computational costs for large-scale inverse problems. To address this, we propose an unconditional representation of the conditional score function (UCoS) tailored to linear inverse problems, which avoids forward model evaluations during sampling by shifting computational effort to an offline training phase. In this phase, a \emph{task-dependent} score function is learned based on the linear forward operator. Crucially, we show that the conditional score can be derived \emph{exactly} from a trained (unconditional) score using affine transformations, eliminating the need for conditional score approximations. Our approach is formulated in infinite-dimensional function spaces, making it inherently discretization-invariant. We support this formulation with a rigorous convergence analysis that justifies UCoS beyond any specific discretization. Finally we validate UCoS through high-dimensional computed tomography (CT) and image deblurring experiments, demonstrating both scalability and accuracy.

MLMay 28, 2023
Conditional score-based diffusion models for Bayesian inference in infinite dimensions

Lorenzo Baldassari, Ali Siahkoohi, Josselin Garnier et al.

Since their initial introduction, score-based diffusion models (SDMs) have been successfully applied to solve a variety of linear inverse problems in finite-dimensional vector spaces due to their ability to efficiently approximate the posterior distribution. However, using SDMs for inverse problems in infinite-dimensional function spaces has only been addressed recently, primarily through methods that learn the unconditional score. While this approach is advantageous for some inverse problems, it is mostly heuristic and involves numerous computationally costly forward operator evaluations during posterior sampling. To address these limitations, we propose a theoretically grounded method for sampling from the posterior of infinite-dimensional Bayesian linear inverse problems based on amortized conditional SDMs. In particular, we prove that one of the most successful approaches for estimating the conditional score in finite dimensions - the conditional denoising estimator - can also be applied in infinite dimensions. A significant part of our analysis is dedicated to demonstrating that extending infinite-dimensional SDMs to the conditional setting requires careful consideration, as the conditional score typically blows up for small times, contrarily to the unconditional score. We conclude by presenting stylized and large-scale numerical examples that validate our approach, offer additional insights, and demonstrate that our method enables large-scale, discretization-invariant Bayesian inference.

LGMay 25, 2023
Multi-scale clustering and source separation of InSight mission seismic data

Ali Siahkoohi, Rudy Morel, Randall Balestriero et al.

Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources in time series data from planetary space missions. As such, a systematic multi-scale unsupervised approach is needed to identify and separate sources at different timescales. Existing methods typically rely on a preselected window size that determines their operating timescale, limiting their capacity to handle multi-scale sources. To address this issue, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering spectra that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial variational autoencoder that is trained to probabilistically cluster sources at different timescales. To perform source separation, we use samples from clusters at multiple timescales obtained via the factorial variational autoencoder as prior information and formulate an optimization problem in the wavelet scattering spectra representation space. When applied to the entire seismic dataset recorded during the NASA InSight mission on Mars, containing sources varying greatly in timescale, our approach disentangles such different sources, e.g., minute-long transient one-sided pulses (known as "glitches") and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes, and provides an opportunity to conduct further investigations into the isolated sources.

STAug 27, 2021
Convergence Rates for Learning Linear Operators from Noisy Data

Maarten V. de Hoop, Nikola B. Kovachki, Nicholas H. Nelsen et al.

This paper studies the learning of linear operators between infinite-dimensional Hilbert spaces. The training data comprises pairs of random input vectors in a Hilbert space and their noisy images under an unknown self-adjoint linear operator. Assuming that the operator is diagonalizable in a known basis, this work solves the equivalent inverse problem of estimating the operator's eigenvalues given the data. Adopting a Bayesian approach, the theoretical analysis establishes posterior contraction rates in the infinite data limit with Gaussian priors that are not directly linked to the forward map of the inverse problem. The main results also include learning-theoretic generalization error guarantees for a wide range of distribution shifts. These convergence rates quantify the effects of data smoothness and true eigenvalue decay or growth, for compact or unbounded operators, respectively, on sample complexity. Numerical evidence supports the theory in diagonal and non-diagonal settings.

OCDec 23, 2019
Deep learning architectures for nonlinear operator functions and nonlinear inverse problems

Maarten V. de Hoop, Matti Lassas, Christopher A. Wong

We develop a theoretical analysis for special neural network architectures, termed operator recurrent neural networks, for approximating nonlinear functions whose inputs are linear operators. Such functions commonly arise in solution algorithms for inverse boundary value problems. Traditional neural networks treat input data as vectors, and thus they do not effectively capture the multiplicative structure associated with the linear operators that correspond to the data in such inverse problems. We therefore introduce a new family that resembles a standard neural network architecture, but where the input data acts multiplicatively on vectors. Motivated by compact operators appearing in boundary control and the analysis of inverse boundary value problems for the wave equation, we promote structure and sparsity in selected weight matrices in the network. After describing this architecture, we study its representation properties as well as its approximation properties. We furthermore show that an explicit regularization can be introduced that can be derived from the mathematical analysis of the mentioned inverse problems, and which leads to certain guarantees on the generalization properties. We observe that the sparsity of the weight matrices improves the generalization estimates. Lastly, we discuss how operator recurrent networks can be viewed as a deep learning analogue to deterministic algorithms such as boundary control for reconstructing the unknown wavespeed in the acoustic wave equation from boundary measurements.

GEO-PHDec 12, 2019
Attention network forecasts time-to-failure in laboratory shear experiments

Hope Jasperson, David C. Bolton, Paul Johnson et al.

Rocks under stress deform by creep mechanisms that include formation and slip on small-scale internal cracks. Intragranular cracks and slip along grain contacts release energy as elastic waves termed acoustic emissions (AE). AEs are thought to contain predictive information that can be used for fault failure forecasting. Here we present a method using unsupervised classification and an attention network to forecast labquakes using AE waveform features. Our data were generated in a laboratory setting using a biaxial shearing device with granular fault gouge intended to mimic the conditions of tectonic faults. Here we analyzed the temporal evolution of AEs generated throughout several hundred laboratory earthquake cycles. We used a Conscience Self-Organizing Map (CSOM) to perform topologically ordered vector quantization based on waveform properties. The resulting map was used to interactively cluster AEs. We examined the clusters over time to identify those with predictive ability. Finally, we used a variety of LSTM and attention-based networks to test the predictive power of the AE clusters. By tracking cumulative waveform features over the seismic cycle, the network is able to forecast the time-to-failure (TTF) of lab earthquakes. Our results show that analyzing the data to isolate predictive signals and using a more sophisticated network architecture are key to robustly forecasting labquakes. In the future, this method could be applied on tectonic faults monitor earthquakes and augment current early warning systems.

CVMay 29, 2018
Random mesh projectors for inverse problems

Sidharth Gupta, Konik Kothari, Maarten V. de Hoop et al.

We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to directly learn the mapping from the measured data to the reconstruction becomes unstable. Instead, we propose to first learn an ensemble of simpler mappings from the data to projections of the unknown image into random piecewise-constant subspaces. We then combine the projections to form a final reconstruction by solving a deconvolution-like problem. We show experimentally that the proposed method is more robust to measurement noise and corruptions not seen during training than a directly learned inverse.

NASep 23, 2015
Multiscale reverse-time-migration-type imaging using the dyadic parabolic decomposition of phase space

Fredrik Andersson, Maarten V. de Hoop, Herwig Wendt

We develop a representation of reverse-time migration in terms of Fourier integral operators the canonical relations of which are graphs. Through the dyadic parabolic decomposition of phase space, we obtain the solution of the wave equation with a boundary source and homogeneous initial conditions using wave packets. On this basis, we develop a numerical procedure for the reverse time continuation from the boundary of scattering data and for RTM migration. The algorithms are derived from those we recently developed for the discrete approximate evaluation of the action of Fourier integral operators and inherit from their conceptual and numerical properties.