LGJul 17, 2023
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum SystemsXuan Zhang, Limei Wang, Jacob Helwig et al. · cambridge, mit
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.
LGSep 27, 2023
Neural Operators for Accelerating Scientific Simulations and DesignKamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li et al.
Scientific discovery and engineering design are currently limited by the time and cost of physical experiments, selected mostly through trial-and-error and intuition that require deep domain expertise. Numerical simulations present an alternative to physical experiments but are usually infeasible for complex real-world domains due to the computational requirements of existing numerical methods. Artificial intelligence (AI) presents a potential paradigm shift by developing fast data-driven surrogate models. In particular, an AI framework, known as Neural Operators, presents a principled framework for learning mappings between functions defined on continuous domains, e.g., spatiotemporal processes and partial differential equations (PDE). They can extrapolate and predict solutions at new locations unseen during training, i.e., perform zero-shot super-resolution. Neural Operators can augment or even replace existing simulators in many applications, such as computational fluid dynamics, weather forecasting, and material modeling, while being 4-5 orders of magnitude faster. Further, Neural Operators can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Since Neural Operators are differentiable, they can directly optimize parameters for inverse design and other inverse problems. We believe that Neural Operators present a transformative approach to simulation and design, enabling rapid research and development.
ROMay 13, 2022
Neural-Fly Enables Rapid Learning for Agile Flight in Strong WindsMichael O'Connell, Guanya Shi, Xichen Shi et al.
Executing safe and precise flight maneuvers in dynamic high-speed winds is important for the ongoing commoditization of uninhabited aerial vehicles (UAVs). However, because the relationship between various wind conditions and its effect on aircraft maneuverability is not well understood, it is challenging to design effective robot controllers using traditional control design methods. We present Neural-Fly, a learning-based approach that allows rapid online adaptation by incorporating pretrained representations through deep learning. Neural-Fly builds on two key observations that aerodynamics in different wind conditions share a common representation and that the wind-specific part lies in a low-dimensional space. To that end, Neural-Fly uses a proposed learning algorithm, domain adversarially invariant meta-learning (DAIML), to learn the shared representation, only using 12 minutes of flight data. With the learned representation as a basis, Neural-Fly then uses a composite adaptation law to update a set of linear coefficients for mixing the basis elements. When evaluated under challenging wind conditions generated with the Caltech Real Weather Wind Tunnel, with wind speeds up to 43.6 kilometers/hour (12.1 meters/second), Neural-Fly achieves precise flight control with substantially smaller tracking error than state-of-the-art nonlinear and adaptive controllers. In addition to strong empirical performance, the exponential stability of Neural-Fly results in robustness guarantees. Last, our control design extrapolates to unseen wind conditions, is shown to be effective for outdoor flights with only onboard sensors, and can transfer across drones with minimal performance degradation.
LGNov 24, 2022
Fast Sampling of Diffusion Models via Operator LearningHongkai Zheng, Weili Nie, Arash Vahdat et al.
Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method that generates images with only one model forward pass. We propose diffusion model sampling with neural operator (DSNO) that maps the initial condition, i.e., Gaussian distribution, to the continuous-time solution trajectory of the reverse diffusion process. To model the temporal correlations along the trajectory, we introduce temporal convolution layers that are parameterized in the Fourier space into the given diffusion model backbone. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
LGOct 31, 2022
Real-time high-resolution CO$_2$ geological storage prediction using nested Fourier neural operatorsGege Wen, Zongyi Li, Qirui Long et al.
Carbon capture and storage (CCS) plays an essential role in global decarbonization. Scaling up CCS deployment requires accurate and high-resolution modeling of the storage reservoir pressure buildup and the gaseous plume migration. However, such modeling is very challenging at scale due to the high computational costs of existing numerical methods. This challenge leads to significant uncertainties in evaluating storage opportunities, which can delay the pace of large-scale CCS deployment. We introduce Nested Fourier Neural Operator (FNO), a machine-learning framework for high-resolution dynamic 3D CO2 storage modeling at a basin scale. Nested FNO produces forecasts at different refinement levels using a hierarchy of FNOs and speeds up flow prediction nearly 700,000 times compared to existing methods. By learning the solution operator for the family of governing partial differential equations, Nested FNO creates a general-purpose numerical simulator alternative for CO2 storage with diverse reservoir conditions, geological heterogeneity, and injection schemes. Our framework enables unprecedented real-time modeling and probabilistic simulations that can support the scale-up of global CCS deployment.
LGFeb 14, 2023
Score-based Diffusion Models in Function SpaceJae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista et al.
Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g., Euclidean, limiting their applications to many domains where the data has a functional form, such as in scientific computing and 3D geometric data analysis. This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. In DDOs, the forward process perturbs input functions gradually using a Gaussian process. The generative process is formulated by a function-valued annealed Langevin dynamic. Our approach requires an appropriate notion of the score for the perturbed data distribution, which we obtain by generalizing denoising score matching to function spaces that can be infinite-dimensional. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution. We theoretically and numerically verify the applicability of our approach on a set of function-valued problems, including generating solutions to the Navier-Stokes equation viewed as the push-forward distribution of forcings from a Gaussian Random Field (GRF), as well as volcano InSAR and MNIST-SDF.
LGMay 6, 2022
Generative Adversarial Neural OperatorsMd Ashiqur Rahman, Manuel A. Florez, Anima Anandkumar et al.
We propose the generative adversarial neural operator (GANO), a generative model paradigm for learning probabilities on infinite-dimensional function spaces. The natural sciences and engineering are known to have many types of data that are sampled from infinite-dimensional function spaces, where classical finite-dimensional deep generative adversarial networks (GANs) may not be directly applicable. GANO generalizes the GAN framework and allows for the sampling of functions by learning push-forward operator maps in infinite-dimensional spaces. GANO consists of two main components, a generator neural operator and a discriminator neural functional. The inputs to the generator are samples of functions from a user-specified probability measure, e.g., Gaussian random field (GRF), and the generator outputs are synthetic data functions. The input to the discriminator is either a real or synthetic data function. In this work, we instantiate GANO using the Wasserstein criterion and show how the Wasserstein loss can be computed in infinite-dimensional spaces. We empirically study GANO in controlled cases where both input and output functions are samples from GRFs and compare its performance to the finite-dimensional counterpart GAN. We empirically study the efficacy of GANO on real-world function data of volcanic activities and show its superior performance over GAN.
LGApr 23, 2022
U-NO: U-shaped Neural OperatorsMd Ashiqur Rahman, Zachary E. Ross, Kamyar Azizzadenesheli
Neural operators generalize classical neural networks to maps between infinite-dimensional spaces, e.g., function spaces. Prior works on neural operators proposed a series of novel methods to learn such maps and demonstrated unprecedented success in learning solution operators of partial differential equations. Due to their close proximity to fully connected architectures, these models mainly suffer from high memory usage and are generally limited to shallow deep learning models. In this paper, we propose U-shaped Neural Operator (U-NO), a U-shaped memory enhanced architecture that allows for deeper neural operators. U-NOs exploit the problem structures in function predictions and demonstrate fast training, data efficiency, and robustness with respect to hyperparameters choices. We study the performance of U-NO on PDE benchmarks, namely, Darcy's flow law and the Navier-Stokes equations. We show that U-NO results in an average of 26% and 44% prediction improvement on Darcy's flow and turbulent Navier-Stokes equations, respectively, over the state of the art. On Navier-Stokes 3D spatiotemporal operator learning task, we show U-NO provides 37% improvement over the state of art methods.
LGJun 22, 2022
Langevin Monte Carlo for Contextual BanditsPan Xu, Hongkai Zheng, Eric Mazumdar et al.
We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior distribution for general reward generating functions. We propose an efficient posterior sampling algorithm, viz., Langevin Monte Carlo Thompson Sampling (LMC-TS), that uses Markov Chain Monte Carlo (MCMC) methods to directly sample from the posterior distribution in contextual bandits. Our method is computationally efficient since it only needs to perform noisy gradient descent updates without constructing the Laplace approximation of the posterior distribution. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits, viz., linear contextual bandits. We conduct experiments on both synthetic data and real-world datasets on different contextual bandit models, which demonstrates that directly sampling from the posterior is both computationally efficient and competitive in performance.
LGJun 17, 2022
Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic ControlTaylan Kargin, Sahin Lale, Kamyar Azizzadenesheli et al.
Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that $\tilde O(\sqrt{T})$ frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains $\tilde O(\sqrt{T})$ regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.
LGSep 29, 2023
Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEsJean Kossaifi, Nikola Kovachki, Kamyar Azizzadenesheli et al.
Memory complexity and data scarcity have so far prohibited learning solution operators of partial differential equations (PDEs) at high resolutions. We address these limitations by introducing a new data efficient and highly parallelizable operator learning approach with reduced memory requirement and better generalization, called multi-grid tensorized neural operator (MG-TFNO). MG-TFNO scales to large resolutions by leveraging local and global structures of full-scale, real-world phenomena, through a decomposition of both the input domain and the operator's parameter space. Our contributions are threefold: i) we enable parallelization over input samples with a novel multi-grid-based domain decomposition, ii) we represent the parameters of the model in a high-order latent subspace of the Fourier domain, through a global tensor factorization, resulting in an extreme reduction in the number of parameters and improved generalization, and iii) we propose architectural improvements to the backbone FNO. Our approach can be used in any operator learning setting. We demonstrate superior performance on the turbulent Navier-Stokes equations where we achieve less than half the error with over 150x compression. The tensorization combined with the domain decomposition, yields over 150x reduction in the number of parameters and 7x reduction in the domain size without losses in accuracy, while slightly enabling parallelism.
LGJul 27, 2023
Guaranteed Approximation Bounds for Mixed-Precision Neural OperatorsRenbo Tu, Colin White, Jean Kossaifi et al.
Neural operators, such as Fourier Neural Operators (FNO), form a principled approach for learning solution operators for PDEs and other mappings between function spaces. However, many real-world problems require high-resolution training data, and the training time and limited GPU memory pose big barriers. One solution is to train neural operators in mixed precision to reduce the memory requirement and increase training speed. However, existing mixed-precision training techniques are designed for standard neural networks, and we find that their direct application to FNO leads to numerical overflow and poor memory efficiency. Further, at first glance, it may appear that mixed precision in FNO will lead to drastic accuracy degradation since reducing the precision of the Fourier transform yields poor results in classical numerical solvers. We show that this is not the case; in fact, we prove that reducing the precision in FNO still guarantees a good approximation bound, when done in a targeted manner. Specifically, we build on the intuition that neural operator learning inherently induces an approximation error, arising from discretizing the infinite-dimensional ground-truth input function, implying that training in full precision is not needed. We formalize this intuition by rigorously characterizing the approximation and precision errors of FNO and bounding these errors for general input functions. We prove that the precision error is asymptotically comparable to the approximation error. Based on this, we design a simple method to optimize the memory-intensive half-precision tensor contractions by greedily finding the optimal contraction order. Through extensive experiments on different state-of-the-art neural operators, datasets, and GPUs, we demonstrate that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
LGJun 3, 2022
KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed Stability in Nonlinear Dynamical SystemsSahin Lale, Yuanyuan Shi, Guannan Qu et al.
Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, current reinforcement learning (RL) methods lack stabilization guarantees, which limits their applicability for the control of safety-critical systems. We propose a model-based RL framework with formal stability guarantees, Krasovskii Constrained RL (KCRL), that adopts Krasovskii's family of Lyapunov functions as a stability constraint. The proposed method learns the system dynamics up to a confidence interval using feature representation, e.g. Random Fourier Features. It then solves a constrained policy optimization problem with a stability constraint based on Krasovskii's method using a primal-dual approach to recover a stabilizing policy. We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system. We also derive the sample complexity upper bound for stabilization of unknown nonlinear dynamical systems via the KCRL framework.
GEO-PHSep 7, 2023
Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and ValidationYaozhong Shi, Grigorios Lavrentiadis, Domniki Asimaki et al.
We present a data-driven framework for ground-motion synthesis that generates three-component acceleration time histories conditioned on moment magnitude, rupture distance , time-average shear-wave velocity at the top $30m$ ($V_{S30}$), and style of faulting. We use a Generative Adversarial Neural Operator (GANO), a resolution invariant architecture that guarantees model training independent of the data sampling frequency. We first present the conditional ground-motion synthesis algorithm (cGM-GANO) and discuss its advantages compared to previous work. We next train cGM-GANO on simulated ground motions generated by the Southern California Earthquake Center Broadband Platform (BBP) and on recorded KiK-net data and show that the model can learn the overall magnitude, distance, and $V_{S30}$ scaling of effective amplitude spectra (EAS) ordinates and pseudo-spectral accelerations (PSA). Results specifically show that cGM-GANO produces consistent median scaling with the training data for the corresponding tectonic environments over a wide range of frequencies for scenarios with sufficient data coverage. For the BBP dataset, cGM-GANO cannot learn the ground motion scaling of the stochastic frequency components; for the KiK-net dataset, the largest misfit is observed at short distances and for soft soil conditions due to the scarcity of such data. Except for these conditions, the aleatory variability of EAS and PSA are captured reasonably well. Lastly, cGM-GANO produces similar median scaling to traditional GMMs for frequencies greater than 1Hz for both PSA and EAS but underestimates the aleatory variability of EAS. Discrepancies in the comparisons between the synthetic ground motions and GMMs are attributed to inconsistencies between the training dataset and the datasets used in GMM development. Our pilot study demonstrates GANO's potential for efficient synthesis of broad-band ground motions
LGAug 17, 2023
Tipping Point Forecasting in Non-Stationary Dynamics on Function SpacesMiguel Liu-Schiaffini, Clare E. Singer, Nikola Kovachki et al.
Tipping points are abrupt, drastic, and often irreversible changes in the evolution of non-stationary and chaotic dynamical systems. For instance, increased greenhouse gas concentrations are predicted to lead to drastic decreases in low cloud cover, referred to as a climatological tipping point. In this paper, we learn the evolution of such non-stationary dynamical systems using a novel recurrent neural operator (RNO), which learns mappings between function spaces. After training RNO on only the pre-tipping dynamics, we employ it to detect future tipping points using an uncertainty-based approach. In particular, we propose a conformal prediction framework to forecast tipping points by monitoring deviations from physics constraints (such as conserved quantities and partial differential equations), enabling forecasting of these abrupt changes along with a rigorous measure of uncertainty. We illustrate our proposed methodology on non-stationary ordinary and partial differential equations, such as the Lorenz-63 and Kuramoto-Sivashinsky equations. We also apply our methods to forecast a climate tipping point in stratocumulus cloud cover and airfoil wake and stall transitions using only limited knowledge of the governing equations. For the latter, we show that our proposed method zero-shot generalizes to forecasting multiple future tipping points under varying Reynolds numbers. In our experiments, we demonstrate that even partial or approximate physics constraints can be used to accurately forecast future tipping points.
LGJul 10, 2024
Dynamical Measure Transport and Neural PDE Solvers for SamplingJingtong Sun, Julius Berner, Lorenz Richter et al.
The task of sampling from a probability density can be approached as transporting a tractable density function to the target, known as dynamical measure transport. In this work, we tackle it through a principled unified framework using deterministic or stochastic evolutions described by partial differential equations (PDEs). This framework incorporates prior trajectory-based sampling methods, such as diffusion models or Schrödinger bridges, without relying on the concept of time-reversals. Moreover, it allows us to propose novel numerical methods for solving the transport task and thus sampling from complicated targets without the need for the normalization constant or data samples. We employ physics-informed neural networks (PINNs) to approximate the respective PDE solutions, implying both conceptional and computational advantages. In particular, PINNs allow for simulation- and discretization-free optimization and can be trained very efficiently, leading to significantly better mode coverage in the sampling task compared to alternative methods. Moreover, they can readily be fine-tuned with Gauss-Newton methods to achieve high accuracy in sampling.
MLJun 27, 2022
Supervised Learning with General Risk FunctionalsLiu Leqi, Audrey Huang, Zachary C. Lipton et al.
Standard uniform convergence results bound the generalization gap of the expected loss over a hypothesis class. The emergence of risk-sensitive learning requires generalization guarantees for functionals of the loss distribution beyond the expectation. While prior works specialize in uniform convergence of particular functionals, our work provides uniform convergence for a general class of Hölder risk functionals for which the closeness in the Cumulative Distribution Function (CDF) entails closeness in risk. We establish the first uniform convergence results for estimating the CDF of the loss distribution, yielding guarantees that hold simultaneously both over all Hölder risk functionals and over all hypotheses. Thus licensed to perform empirical risk minimization, we develop practical gradient-based methods for minimizing distortion risks (widely studied subset of Hölder risks that subsumes the spectral risks, including the mean, conditional value at risk, cumulative prospect theory risks, and others) and provide convergence guarantees. In experiments, we demonstrate the efficacy of our learning procedure, both in settings where uniform convergence results hold and in high-dimensional settings with deep networks.
CVNov 25, 2022
PaCMO: Partner Dependent Human Motion Generation in Dyadic Human Activity using Neural OperatorsMd Ashiqur Rahman, Jasorsi Ghosh, Hrishikesh Viswanath et al.
We address the problem of generating 3D human motions in dyadic activities. In contrast to the concurrent works, which mainly focus on generating the motion of a single actor from the textual description, we generate the motion of one of the actors from the motion of the other participating actor in the action. This is a particularly challenging, under-explored problem, that requires learning intricate relationships between the motion of two actors participating in an action and also identifying the action from the motion of one actor. To address these, we propose partner conditioned motion operator (PaCMO), a neural operator-based generative model which learns the distribution of human motion conditioned by the partner's motion in function spaces through adversarial training. Our model can handle long unlabeled action sequences at arbitrary time resolution. We also introduce the "Functional Frechet Inception Distance" ($F^2ID$) metric for capturing similarity between real and generated data for function spaces. We test PaCMO on NTU RGB+D and DuetDance datasets and our model produces realistic results evidenced by the $F^2ID$ score and the conducted user study.
LGSep 21, 2022
Off-Policy Risk Assessment in Markov Decision ProcessesAudrey Huang, Liu Leqi, Zachary Chase Lipton et al.
Addressing such diverse ends as safety alignment with human preferences, and the efficiency of learning, a growing line of reinforcement learning research focuses on risk functionals that depend on the entire distribution of returns. Recent work on \emph{off-policy risk assessment} (OPRA) for contextual bandits introduced consistent estimators for the target policy's CDF of returns along with finite sample guarantees that extend to (and hold simultaneously over) all risk. In this paper, we lift OPRA to Markov decision processes (MDPs), where importance sampling (IS) CDF estimators suffer high variance on longer trajectories due to small effective sample size. To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs. This estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound. Moreover, for many risk functionals, the downstream estimates enjoy both lower bias and lower variance. Additionally, we derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant factor. Finally, we demonstrate the precision of our DR CDF estimates experimentally on several different environments.
LGMay 28, 2022
Functional Linear Regression of Cumulative Distribution FunctionsQian Zhang, Anuran Makur, Kamyar Azizzadenesheli
The estimation of cumulative distribution functions (CDF) is an important learning task with a great variety of downstream applications, such as risk assessments in predictions and decision making. In this paper, we study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF basis functions. We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ basis functions, we show estimation error upper bounds of $\widetilde O(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. Furthermore, we remove the burn-in time in the random design setting using an alternative penalized estimator. Then, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimators in terms of the mismatched error, and show that the estimators are well-behaved under model mismatch. Moreover, to complete our study, we formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish a self-normalized estimation error upper bound for this setting. Notably, the upper bound reduces to the $\widetilde O(\sqrt{d/n})$ bound when the parameter space is constrained to be $d$-dimensional. Our comprehensive numerical experiments validate the efficacy of our estimation methods in both synthetic and practical settings.
OCMay 27, 2022
Competitive Gradient OptimizationAbhijeet Vyas, Kamyar Azizzadenesheli
We study the problem of convergence to a stationary point in zero-sum games. We propose competitive gradient optimization (CGO ), a gradient-based method that incorporates the interactions between the two players in zero-sum games for optimization updates. We provide continuous-time analysis of CGO and its convergence properties while showing that in the continuous limit, CGO predecessors degenerate to their gradient descent ascent (GDA) variants. We provide a rate of convergence to stationary points and further propose a generalized class of $α$-coherent function for which we provide convergence analysis. We show that for strictly $α$-coherent functions, our algorithm convergences to a saddle point. Moreover, we propose optimistic CGO (OCGO), an optimistic variant, for which we show convergence rate to saddle points in $α$-coherent class of functions.
GEO-PHFeb 12
Enforcing Reciprocity in Operator Learning for Seismic Wave PropagationCaifeng Zou, Yaozhong Shi, Zachary E. Ross et al.
Accurate and efficient wavefield modeling underpins seismic structure and source studies. Traditional methods comply with physical laws but are computationally intensive. Data-driven methods, while opening new avenues for advancement, have yet to incorporate strict physical consistency. The principle of reciprocity is one of the most fundamental physical laws in wave propagation. We introduce the Reciprocity-Enforced Neural Operator (RENO), a transformer-based architecture for modeling seismic wave propagation that hard-codes the reciprocity principle. The model leverages the cross-attention mechanism and commutative operations to guarantee invariance under swapping source and receiver positions. Beyond improved physical consistency, the proposed architecture supports simultaneous realizations for multiple sources without crosstalk issues. This yields an order-of-magnitude inference speedup at a similar memory footprint over an reciprocity-unenforced neural operator on a realistic configuration. We demonstrate the functionality using the reciprocity relation for particle velocity fields under single forces. This architecture is also applicable to pressure fields under dilatational sources and travel-time fields governed by the eikonal equation, paving the way for encoding more complex reciprocity relations.
OCJul 12, 2022
Compactly Restrictable Metric Policy Optimization ProblemsVictor D. Dorobantu, Kamyar Azizzadenesheli, Yisong Yue
We study policy optimization problems for deterministic Markov decision processes (MDPs) with metric state and action spaces, which we refer to as Metric Policy Optimization Problems (MPOPs). Our goal is to establish theoretical results on the well-posedness of MPOPs that can characterize practically relevant continuous control systems. To do so, we define a special class of MPOPs called Compactly Restrictable MPOPs (CR-MPOPs), which are flexible enough to capture the complex behavior of robotic systems but specific enough to admit solutions using dynamic programming methods such as value iteration. We show how to arrive at CR-MPOPs using forward-invariance. We further show that our theoretical results on CR-MPOPs can be used to characterize feedback linearizable control affine systems.
LGDec 13, 2024Code
A Library for Learning Neural OperatorsJean Kossaifi, Nikola Kovachki, Zongyi Li et al.
We present NeuralOperator, an open-source Python library for operator learning. Neural operators generalize neural networks to maps between function spaces instead of finite-dimensional Euclidean spaces. They can be trained and inferenced on input and output functions given at various discretizations, satisfying a discretization convergence properties. Built on top of PyTorch, NeuralOperator provides all the tools for training and deploying neural operator models, as well as developing new ones, in a high-quality, tested, open-source package. It combines cutting-edge models and customizability with a gentle learning curve and simple user interface for newcomers.
LGMay 22, 2025Code
Guided Diffusion Sampling on Function Spaces with Applications to PDEsJiachen Yao, Abbas Mammadov, Julius Berner et al.
We propose a general framework for conditional sampling in PDE-based inverse problems, targeting the recovery of whole solutions from extremely sparse or noisy measurements. This is accomplished by a function-space diffusion model and plug-and-play guidance for conditioning. Our method first trains an unconditional discretization-agnostic denoising model using neural operator architectures. At inference, we refine the samples to satisfy sparse observation data via a gradient-based guidance mechanism. Through rigorous mathematical analysis, we extend Tweedie's formula to infinite-dimensional Hilbert spaces, providing the theoretical foundation for our posterior sampling approach. Our method (FunDPS) accurately captures posterior distributions in function spaces under minimal supervision and severe data scarcity. Across five PDE tasks with only 3% observation, our method achieves an average 32% accuracy improvement over state-of-the-art fixed-resolution diffusion baselines while reducing sampling steps by 4x. Furthermore, multi-resolution fine-tuning ensures strong cross-resolution generalizability. To the best of our knowledge, this is the first diffusion-based framework to operate independently of discretization, offering a practical and flexible solution for forward and inverse problems in the context of PDEs. Code is available at https://github.com/neuraloperator/FunDPS
69.0LGMar 18
Large-Scale 3D Ground-Motion Synthesis with Physics-Inspired Latent Operator Flow MatchingYaozhong Shi, Grigorios Lavrentiadis, Konstantinos Tsalouchidis et al.
Earthquake hazard analysis and design of spatially distributed infrastructure, such as power grids and energy pipeline networks, require scenario-specific ground-motion time histories with realistic frequency content and spatiotemporal coherence. However, producing the large ensembles needed for uncertainty quantification with physics-based simulations is computationally intensive and impractical for engineering workflows. To address this challenge, we introduce Ground-Motion Flow (GMFlow), a physics-inspired latent operator flow matching framework that generates realistic, large-scale regional ground-motion time-histories conditioned on physical parameters. Validated on simulated earthquake scenarios in the San Francisco Bay Area, GMFlow generates spatially coherent ground motion across more than 9 million grid points in seconds, achieving a 10,000-fold speedup over the simulation workflow, which opens a path toward rapid and uncertainty-aware hazard assessment for distributed infrastructure. More broadly, GMFlow advances mesh-agnostic functional generative modeling and could potentially be extended to the synthesis of large-scale spatiotemporal physical fields in diverse scientific domains.
LGJun 12, 2025Code
Principled Approaches for Extending Neural Architectures to Function Spaces for Operator LearningJulius Berner, Miguel Liu-Schiaffini, Jean Kossaifi et al.
A wide range of scientific problems, such as those described by continuous-time dynamical systems and partial differential equations (PDEs), are naturally formulated on function spaces. While function spaces are typically infinite-dimensional, deep learning has predominantly advanced through applications in computer vision and natural language processing that focus on mappings between finite-dimensional spaces. Such fundamental disparities in the nature of the data have limited neural networks from achieving a comparable level of success in scientific applications as seen in other fields. Neural operators are a principled way to generalize neural networks to mappings between function spaces, offering a pathway to replicate deep learning's transformative impact on scientific problems. For instance, neural operators can learn solution operators for entire classes of PDEs, e.g., physical systems with different boundary conditions, coefficient functions, and geometries. A key factor in deep learning's success has been the careful engineering of neural architectures through extensive empirical testing. Translating these neural architectures into neural operators allows operator learning to enjoy these same empirical optimizations. However, prior neural operator architectures have often been introduced as standalone models, not directly derived as extensions of existing neural network architectures. In this paper, we identify and distill the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces. Using these principles, we propose a recipe for converting several popular neural architectures into neural operators with minimal modifications. This paper aims to guide practitioners through this process and details the steps to make neural operators work in practice. Our code can be found at https://github.com/neuraloperator/NNs-to-NOs
LGJan 19, 2024Code
Equivariant Graph Neural Operator for Modeling 3D DynamicsMinkai Xu, Jiaqi Han, Aaron Lou et al.
Modeling the complex three-dimensional (3D) dynamics of relational systems is an important problem in the natural sciences, with applications ranging from molecular simulations to particle mechanics. Machine learning methods have achieved good success by learning graph neural networks to model spatial interactions. However, these approaches do not faithfully capture temporal correlations since they only model next-step predictions. In this work, we propose Equivariant Graph Neural Operator (EGNO), a novel and principled method that directly models dynamics as trajectories instead of just next-step prediction. Different from existing methods, EGNO explicitly learns the temporal evolution of 3D dynamics where we formulate the dynamics as a function over time and learn neural operators to approximate it. To capture the temporal correlations while keeping the intrinsic SE(3)-equivariance, we develop equivariant temporal convolutions parameterized in the Fourier space and build EGNO by stacking the Fourier layers over equivariant networks. EGNO is the first operator learning framework that is capable of modeling solution dynamics functions over time while retaining 3D equivariance. Comprehensive experiments in multiple domains, including particle simulations, human motion capture, and molecular dynamics, demonstrate the significantly superior performance of EGNO against existing methods, thanks to the equivariant temporal modeling. Our code is available at https://github.com/MinkaiXu/egno.
LGFeb 13, 2018Code
signSGD: Compressed Optimisation for Non-Convex ProblemsJeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli et al.
Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative $\ell_1/\ell_2$ geometry of gradients, noise and curvature informs whether signSGD or SGD is theoretically better suited to a particular problem. On the practical side we find that the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD .
LGFeb 26, 2024
Neural Operators with Localized Integral and Differential KernelsMiguel Liu-Schiaffini, Julius Berner, Boris Bonev et al.
Neural operators learn mappings between function spaces, which is practical for learning solution operators of PDEs and other scientific modeling applications. Among them, the Fourier neural operator (FNO) is a popular architecture that performs global convolutions in the Fourier space. However, such global operations are often prone to over-smoothing and may fail to capture local details. In contrast, convolutional neural networks (CNN) can capture local features but are limited to training and inference at a single resolution. In this work, we present a principled approach to operator learning that can capture local features under two frameworks by learning differential operators and integral operators with locally supported kernels. Specifically, inspired by stencil methods, we prove that we obtain differential operators under an appropriate scaling of the kernel values of CNNs. To obtain local integral operators, we utilize suitable basis representations for the kernels based on discrete-continuous convolutions. Both these approaches preserve the properties of operator learning and, hence, the ability to predict at any resolution. Adding our layers to FNOs significantly improves their performance, reducing the relative L2-error by 34-72% in our experiments, which include a turbulent 2D Navier-Stokes and the spherical shallow water equations.
LGFeb 2, 2024
Calibrated Uncertainty Quantification for Operator Learning via Conformal PredictionZiqi Ma, Kamyar Azizzadenesheli, Anima Anandkumar
Operator learning has been increasingly adopted in scientific and engineering applications, many of which require calibrated uncertainty quantification. Since the output of operator learning is a continuous function, quantifying uncertainty simultaneously at all points in the domain is challenging. Current methods consider calibration at a single point or over one scalar function or make strong assumptions such as Gaussianity. We propose a risk-controlling quantile neural operator, a distribution-free, finite-sample functional calibration conformal prediction method. We provide a theoretical calibration guarantee on the coverage rate, defined as the expected percentage of points on the function domain whose true value lies within the predicted uncertainty ball. Empirical results on a 2D Darcy flow and a 3D car surface pressure prediction task validate our theoretical results, demonstrating calibrated coverage and efficient uncertainty bands outperforming baseline methods. In particular, on the 3D problem, our method is the only one that meets the target calibration percentage (percentage of test samples for which the uncertainty estimates are calibrated) of 98%.
CVFeb 6, 2025
Factorized Implicit Global Convolution for Automotive Computational Fluid Dynamics PredictionChris Choy, Alexey Kamenev, Jean Kossaifi et al.
Computational Fluid Dynamics (CFD) is crucial for automotive design, requiring the analysis of large 3D point clouds to study how vehicle geometry affects pressure fields and drag forces. However, existing deep learning approaches for CFD struggle with the computational complexity of processing high-resolution 3D data. We propose Factorized Implicit Global Convolution (FIGConv), a novel architecture that efficiently solves CFD problems for very large 3D meshes with arbitrary input and output geometries. FIGConv achieves quadratic complexity $O(N^2)$, a significant improvement over existing 3D neural CFD models that require cubic complexity $O(N^3)$. Our approach combines Factorized Implicit Grids to approximate high-resolution domains, efficient global convolutions through 2D reparameterization, and a U-shaped architecture for effective information gathering and integration. We validate our approach on the industry-standard Ahmed body dataset and the large-scale DrivAerNet dataset. In DrivAerNet, our model achieves an $R^2$ value of 0.95 for drag prediction, outperforming the previous state-of-the-art by a significant margin. This represents a 40% improvement in relative mean squared error and a 70% improvement in absolute mean squared error over previous methods.
COMP-PHMar 14, 2025
Fourier Neural Operator based surrogates for $CO_2$ storage in realistic geologiesAnirban Chandra, Marius Koch, Suraj Pawar et al.
This study aims to develop surrogate models for accelerating decision making processes associated with carbon capture and storage (CCS) technologies. Selection of sub-surface $CO_2$ storage sites often necessitates expensive and involved simulations of $CO_2$ flow fields. Here, we develop a Fourier Neural Operator (FNO) based model for real-time, high-resolution simulation of $CO_2$ plume migration. The model is trained on a comprehensive dataset generated from realistic subsurface parameters and offers $O(10^5)$ computational acceleration with minimal sacrifice in prediction accuracy. We also explore super-resolution experiments to improve the computational cost of training the FNO based models. Additionally, we present various strategies for improving the reliability of predictions from the model, which is crucial while assessing actual geological sites. This novel framework, based on NVIDIA's Modulus library, will allow rapid screening of sites for CCS. The discussed workflows and strategies can be applied to other energy solutions like geothermal reservoir modeling and hydrogen storage. Our work scales scientific machine learning models to realistic 3D systems that are more consistent with real-life subsurface aquifers/reservoirs, paving the way for next-generation digital twins for subsurface CCS applications.
LGApr 3, 2024
Universal Functional Regression with Neural Operator FlowsYaozhong Shi, Angela F. Gao, Zachary E. Ross et al.
Regression on function spaces is typically limited to models with Gaussian process priors. We introduce the notion of universal functional regression, in which we aim to learn a prior distribution over non-Gaussian function spaces that remains mathematically tractable for functional regression. To do this, we develop Neural Operator Flows (OpFlow), an infinite-dimensional extension of normalizing flows. OpFlow is an invertible operator that maps the (potentially unknown) data function space into a Gaussian process, allowing for exact likelihood estimation of functional point evaluations. OpFlow enables robust and accurate uncertainty quantification via drawing posterior samples of the Gaussian process and subsequently mapping them into the data function space. We empirically study the performance of OpFlow on regression and generation tasks with data generated from Gaussian processes with known posterior forms and non-Gaussian processes, as well as real-world earthquake seismograms with an unknown closed-form distribution.
LGJan 7, 2025
Stochastic Process Learning via Operator Flow MatchingYaozhong Shi, Zachary E. Ross, Domniki Asimaki et al.
Expanding on neural operators, we propose a novel framework for stochastic process learning across arbitrary domains. In particular, we develop operator flow matching (OFM) for learning stochastic process priors on function spaces. OFM provides the probability density of the values of any collection of points and enables mathematically tractable functional regression at new points with mean and density estimation. Our method outperforms state-of-the-art models in stochastic process learning, functional regression, and prior learning.
IVJan 2, 2025
Ultrasound Lung Aeration Map via Physics-Aware Neural OperatorsJiayun Wang, Oleksii Ostras, Masashi Sode et al.
Lung ultrasound is a growing modality in clinics for diagnosing and monitoring acute and chronic lung diseases due to its low cost and accessibility. Lung ultrasound works by emitting diagnostic pulses, receiving pressure waves and converting them into radio frequency (RF) data, which are then processed into B-mode images with beamformers for radiologists to interpret. However, unlike conventional ultrasound for soft tissue anatomical imaging, lung ultrasound interpretation is complicated by complex reverberations from the pleural interface caused by the inability of ultrasound to penetrate air. The indirect B-mode images make interpretation highly dependent on reader expertise, requiring years of training, which limits its widespread use despite its potential for high accuracy in skilled hands. To address these challenges and democratize ultrasound lung imaging as a reliable diagnostic tool, we propose LUNA, an AI model that directly reconstructs lung aeration maps from RF data, bypassing the need for traditional beamformers and indirect interpretation of B-mode images. LUNA uses a Fourier neural operator, which processes RF data efficiently in Fourier space, enabling accurate reconstruction of lung aeration maps. LUNA offers a quantitative, reader-independent alternative to traditional semi-quantitative lung ultrasound scoring methods. The development of LUNA involves synthetic and real data: We simulate synthetic data with an experimentally validated approach and scan ex vivo swine lungs as real data. Trained on abundant simulated data and fine-tuned with a small amount of real-world data, LUNA achieves robust performance, demonstrated by an aeration estimation error of 9% in ex-vivo lung scans. We demonstrate the potential of reconstructing lung aeration maps from RF data, providing a foundation for improving lung ultrasound reproducibility and diagnostic utility.
LGJun 9, 2025
Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range TasksAli Hariri, Álvaro Arroyo, Alessio Gravina et al.
ChebNet, one of the earliest spectral GNNs, has largely been overshadowed by Message Passing Neural Networks (MPNNs), which gained popularity for their simplicity and effectiveness in capturing local graph structure. Despite their success, MPNNs are limited in their ability to capture long-range dependencies between nodes. This has led researchers to adapt MPNNs through rewiring or make use of Graph Transformers, which compromises the computational efficiency that characterized early spatial message-passing architectures, and typically disregards the graph structure. Almost a decade after its original introduction, we revisit ChebNet to shed light on its ability to model distant node interactions. We find that out-of-box, ChebNet already shows competitive advantages relative to classical MPNNs and GTs on long-range benchmarks, while maintaining good scalability properties for high-order polynomials. However, we uncover that this polynomial expansion leads ChebNet to an unstable regime during training. To address this limitation, we cast ChebNet as a stable and non-dissipative dynamical system, which we coin Stable-ChebNet. Our Stable-ChebNet model allows for stable information propagation, and has controllable dynamics which do not require the use of eigendecompositions, positional encodings, or graph rewiring. Across several benchmarks, Stable-ChebNet achieves near state-of-the-art performance.
LGJun 20, 2025
Mesh-Informed Neural Operator : A Transformer Generative ApproachYaozhong Shi, Zachary E. Ross, Domniki Asimaki et al.
Generative models in function spaces, situated at the intersection of generative modeling and operator learning, are attracting increasing attention due to their immense potential in diverse scientific and engineering applications. While functional generative models are theoretically domain- and discretization-agnostic, current implementations heavily rely on the Fourier Neural Operator (FNO), limiting their applicability to regular grids and rectangular domains. To overcome these critical limitations, we introduce the Mesh-Informed Neural Operator (MINO). By leveraging graph neural operators and cross-attention mechanisms, MINO offers a principled, domain- and discretization-agnostic backbone for generative modeling in function spaces. This advancement significantly expands the scope of such models to more diverse applications in generative, inverse, and regression tasks. Furthermore, MINO provides a unified perspective on integrating neural operators with general advanced deep learning architectures. Finally, we introduce a suite of standardized evaluation metrics that enable objective comparison of functional generative models, addressing another critical gap in the field.
LGApr 11, 2025
Enabling Automatic Differentiation with Mollified Graph Neural OperatorsRyan Y. Lin, Julius Berner, Valentin Duruisseaux et al.
Physics-informed neural operators offer a powerful framework for learning solution operators of partial differential equations (PDEs) by combining data and physics losses. However, these physics losses rely on derivatives. Computing these derivatives remains challenging, with spectral and finite difference methods introducing approximation errors due to finite resolution. Here, we propose the mollified graph neural operator ($m$GNO), the first method to leverage automatic differentiation and compute exact gradients on arbitrary geometries. This enhancement enables efficient training on irregular grids and varying geometries while allowing seamless evaluation of physics losses at randomly sampled points for improved generalization. For a PDE example on regular grids, $m$GNO paired with autograd reduced the L2 relative data error by 20x compared to finite differences, although training was slower. It can also solve PDEs on unstructured point clouds seamlessly, using physics losses only, at resolutions vastly lower than those needed for finite differences to be accurate enough. On these unstructured point clouds, $m$GNO leads to errors that are consistently 2 orders of magnitude lower than machine learning baselines (Meta-PDE, which accelerates PINNs) for comparable runtimes, and also delivers speedups from 1 to 3 orders of magnitude compared to the numerical solver for similar accuracy. $m$GNOs can also be used to solve inverse design and shape optimization problems on complex geometries.
GEO-PHMar 19, 2025
Ambient Noise Full Waveform Inversion with Neural OperatorsCaifeng Zou, Zachary E. Ross, Robert W. Clayton et al.
Numerical simulations of seismic wave propagation are crucial for investigating velocity structures and improving seismic hazard assessment. However, standard methods such as finite difference or finite element are computationally expensive. Recent studies have shown that a new class of machine learning models, called neural operators, can solve the elastodynamic wave equation orders of magnitude faster than conventional methods. Full waveform inversion is a prime beneficiary of the accelerated simulations. Neural operators, as end-to-end differentiable operators, combined with automatic differentiation, provide an alternative approach to the adjoint-state method. State-of-the-art optimization techniques built into PyTorch provide neural operators with greater flexibility to improve the optimization dynamics of full waveform inversion, thereby mitigating cycle-skipping problems. In this study, we demonstrate the first application of neural operators for full waveform inversion on a real seismic dataset, which consists of several nodal transects collected across the San Gabriel, Chino, and San Bernardino basins in the Los Angeles metropolitan area.
LGOct 3, 2025
Physics-informed Neural-operator Predictive Control for Drag Reduction in Turbulent FlowsZelin Zhao, Zongyi Li, Kimia Hassibi et al.
Assessing turbulence control effects for wall friction numerically is a significant challenge since it requires expensive simulations of turbulent fluid dynamics. We instead propose an efficient deep reinforcement learning (RL) framework for modeling and control of turbulent flows. It is model-based RL for predictive control (PC), where both the policy and the observer models for turbulence control are learned jointly using Physics Informed Neural Operators (PINO), which are discretization invariant and can capture fine scales in turbulent flows accurately. Our PINO-PC outperforms prior model-free reinforcement learning methods in various challenging scenarios where the flows are of high Reynolds numbers and unseen, i.e., not provided during model training. We find that PINO-PC achieves a drag reduction of 39.0\% under a bulk-velocity Reynolds number of 15,000, outperforming previous fluid control methods by more than 32\%.
IVSep 11, 2025
Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural OperatorsJiayun Wang, Yousuf Aborahama, Arya Khokhar et al.
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit. While three-dimensional PACT systems enable high-resolution volumetric imaging for applications spanning transcranial to breast imaging, current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation. We introduce Pano (PACT imaging neural operator), an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions. Unlike existing approaches (e.g. universal back-projection algorithm), Pano learns both physics and data priors while also being agnostic to the input data resolution. Pano employs spherical discrete-continuous convolutions to preserve hemispherical sensor geometry, incorporates Helmholtz equation constraints to ensure physical consistency and operates resolutionindependently across varying sensor configurations. We demonstrate the robustness and efficiency of Pano in reconstructing high-quality images from both simulated and real experimental data, achieving consistent performance even with significantly reduced transducer counts and limited-angle acquisition configurations. The framework maintains reconstruction fidelity across diverse sparse sampling patterns while enabling real-time volumetric imaging capabilities. This advancement establishes a practical pathway for making 3D PACT more accessible and feasible for both preclinical research and clinical applications, substantially reducing hardware requirements without compromising image reconstruction quality.
SYAug 27, 2025
Neural Spline Operators for Risk Quantification in Stochastic SystemsZhuoyuan Wang, Raffaele Romagnoli, Kamyar Azizzadenesheli et al. · cmu
Accurately quantifying long-term risk probabilities in diverse stochastic systems is essential for safety-critical control. However, existing sampling-based and partial differential equation (PDE)-based methods often struggle to handle complex varying dynamics. Physics-informed neural networks learn surrogate mappings for risk probabilities from varying system parameters of fixed and finite dimensions, yet can not account for functional variations in system dynamics. To address these challenges, we introduce physics-informed neural operator (PINO) methods to risk quantification problems, to learn mappings from varying \textit{functional} system dynamics to corresponding risk probabilities. Specifically, we propose Neural Spline Operators (NeSO), a PINO framework that leverages B-spline representations to improve training efficiency and achieve better initial and boundary condition enforcements, which are crucial for accurate risk quantification. We provide theoretical analysis demonstrating the universal approximation capability of NeSO. We also present two case studies, one with varying functional dynamics and another with high-dimensional multi-agent dynamics, to demonstrate the efficacy of NeSO and its significant online speed-up over existing methods. The proposed framework and the accompanying universal approximation theorem are expected to be beneficial for other control or PDE-related problems beyond risk quantification.
LGJul 24, 2025
Scale-Consistent Learning for Partial Differential EquationsZongyi Li, Samuel Lanthaler, Catherine Deng et al.
Machine learning (ML) models have emerged as a promising approach for solving partial differential equations (PDEs) in science and engineering. Previous ML models typically cannot generalize outside the training data; for example, a trained ML model for the Navier-Stokes equations only works for a fixed Reynolds number ($Re$) on a pre-defined domain. To overcome these limitations, we propose a data augmentation scheme based on scale-consistency properties of PDEs and design a scale-informed neural operator that can model a wide range of scales. Our formulation leverages the facts: (i) PDEs can be rescaled, or more concretely, a given domain can be re-scaled to unit size, and the parameters and the boundary conditions of the PDE can be appropriately adjusted to represent the original solution, and (ii) the solution operators on a given domain are consistent on the sub-domains. We leverage these facts to create a scale-consistency loss that encourages matching the solutions evaluated on a given domain and the solution obtained on its sub-domain from the rescaled PDE. Since neural operators can fit to multiple scales and resolutions, they are the natural choice for incorporating scale-consistency loss during training of neural PDE solvers. We experiment with scale-consistency loss and the scale-informed neural operator model on the Burgers' equation, Darcy Flow, Helmholtz equation, and Navier-Stokes equations. With scale-consistency, the model trained on $Re$ of 1000 can generalize to $Re$ ranging from 250 to 10000, and reduces the error by 34% on average of all datasets compared to baselines.
CVJul 3, 2025
Neural Dynamic Modes: Computational Imaging of Dynamical Systems from Sparse ObservationsAli SaraerToosi, Renbo Tu, Kamyar Azizzadenesheli et al.
Dynamical systems are ubiquitous within science and engineering, from turbulent flow across aircraft wings to structural variability of proteins. Although some systems are well understood and simulated, scientific imaging often confronts never-before-seen dynamics observed through indirect, noisy, and highly sparse measurements. We present NeuralDMD, a model-free framework that combines neural implicit representations with Dynamic Mode Decomposition (DMD) to reconstruct continuous spatio-temporal dynamics from such measurements. The expressiveness of neural representations enables capturing complex spatial structures, while the linear dynamical modes of DMD introduce an inductive bias that guides training and supports stable, low-dimensional representations and forecasting. We validate NeuralDMD on two real-world problems: reconstructing near-surface wind-speed fields over North America from sparse station observations, and recovering the evolution of plasma near the Galactic-center black hole, Sgr A*. In both cases, NeuralDMD outperforms established baselines, demonstrating its potential as a general tool for imaging dynamical systems across geoscience, astronomy, and beyond.
AIApr 28, 2025
Deep Generative Prior for First Order Inverse OptimizationHaoyu Yang, Kamyar Azizzadenesheli, Haoxing Ren
Inverse design optimization aims to infer system parameters from observed solutions, posing critical challenges across domains such as semiconductor manufacturing, structural engineering, materials science, and fluid dynamics. The lack of explicit mathematical representations in many systems complicates this process and makes the first order optimization impossible. Mainstream approaches, including generative AI and Bayesian optimization, address these challenges but have limitations. Generative AI is computationally expensive, while Bayesian optimization, relying on surrogate models, suffers from scalability, sensitivity to priors, and noise issues, often leading to suboptimal solutions. This paper introduces Deep Physics Prior (DPP), a novel method enabling first-order gradient-based inverse optimization with surrogate machine learning models. By leveraging pretrained auxiliary Neural Operators, DPP enforces prior distribution constraints to ensure robust and meaningful solutions. This approach is particularly effective when prior data and observation distributions are unknown.
LGMar 19, 2024
Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEsMd Ashiqur Rahman, Robert Joseph George, Mogab Elleithy et al.
Existing neural operator architectures face challenges when solving multiphysics problems with coupled partial differential equations (PDEs) due to complex geometries, interactions between physical variables, and the limited amounts of high-resolution training data. To address these issues, we propose Codomain Attention Neural Operator (CoDA-NO), which tokenizes functions along the codomain or channel space, enabling self-supervised learning or pretraining of multiple PDE systems. Specifically, we extend positional encoding, self-attention, and normalization layers to function spaces. CoDA-NO can learn representations of different PDE systems with a single model. We evaluate CoDA-NO's potential as a backbone for learning multiphysics PDEs over multiple systems by considering few-shot learning settings. On complex downstream tasks with limited data, such as fluid flow simulations, fluid-structure interactions, and Rayleigh-Bénard convection, we found CoDA-NO to outperform existing methods by over 36%.
LGSep 1, 2023
Geometry-Informed Neural Operator for Large-Scale 3D PDEsZongyi Li, Nikola Borislavov Kovachki, Chris Choy et al.
We propose the geometry-informed neural operator (GINO), a highly efficient approach to learning the solution operator of large-scale partial differential equations with varying geometries. GINO uses a signed distance function and point-cloud representations of the input shape and neural operators based on graph and Fourier architectures to learn the solution operator. The graph neural operator handles irregular grids and transforms them into and from regular latent grids on which Fourier neural operator can be efficiently applied. GINO is discretization-convergent, meaning the trained model can be applied to arbitrary discretization of the continuous domain and it converges to the continuum operator as the discretization is refined. To empirically validate the performance of our method on large-scale simulation, we generate the industry-standard aerodynamics dataset of 3D vehicle geometries with Reynolds numbers as high as five million. For this large-scale 3D fluid simulation, numerical methods are expensive to compute surface pressure. We successfully trained GINO to predict the pressure on car surfaces using only five hundred data points. The cost-accuracy experiments show a $26,000 \times$ speed-up compared to optimized GPU-based computational fluid dynamics (CFD) simulators on computing the drag coefficient. When tested on new combinations of geometries and boundary conditions (inlet velocities), GINO obtains a one-fourth reduction in error rate compared to deep neural network approaches.
LGMay 29, 2023
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloHaque Ishfaq, Qingfeng Lan, Pan Xu et al.
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of $\tilde{O}(d^{3/2}H^{3/2}\sqrt{T})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $T$ is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
AO-PHFeb 22, 2022
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural OperatorsJaideep Pathak, Shashank Subramanian, Peter Harrington et al.
FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models.