NAJul 20, 2021
A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation SolutionsYingbo Ma, Vaibhav Dixit, Mike Innes et al.
Derivatives of differential equation solutions are commonly for parameter estimation, fitting neural differential equations, and as model diagnostics. However, with a litany of choices and a Cartesian product of potential methods, it can be difficult for practitioners to understand which method is likely to be the most effective on their particular application. In this manuscript we investigate the performance characteristics of Discrete Local Sensitivity Analysis implemented via Automatic Differentiation (DSAAD) against continuous adjoint sensitivity analysis. Non-stiff and stiff biological and pharmacometric models, including a PDE discretization, are used to quantify the performance of sensitivity analysis methods. Our benchmarks show that on small systems of ODEs (approximately $<100$ parameters+ODEs), forward-mode DSAAD is more efficient than both reverse-mode and continuous forward/adjoint sensitivity analysis. The scalability of continuous adjoint methods is shown to be more efficient than discrete adjoints and forward methods after crossing this size range. These comparative studies demonstrate a trade-off between memory usage and performance in the continuous adjoint methods that should be considered when choosing the technique, while numerically unstable backsolve techniques from the machine learning literature are demonstrated as unsuitable for most scientific models. The performance of adjoint methods is shown to be heavily tied to the reverse-mode AD method, with tape-based AD methods shown to be 2 orders of magnitude slower on nonlinear partial differential equations than static AD techniques. These results also demonstrate the applicability of DSAAD to differential-algebraic equations, delay differential equations, and hybrid differential equation systems, showcasing an ease of implementation advantage for DSAAD approaches.
NAApr 12, 2018
Stability-Optimized High Order Methods and Stiffness Detection for Pathwise Stiff Stochastic Differential EquationsChristopher Rackauckas, Qing Nie
Stochastic differential equations (SDE) often exhibit large random transitions. This property, which we denote as pathwise stiffness, causes transient bursts of stiffness which limit the allowed step size for common fixed time step explicit and drift-implicit integrators. We present four separate methods to efficiently handle this stiffness. First, we utilize a computational technique to derive stability-optimized adaptive methods of strong order 1.5 for SDEs. The resulting explicit methods are shown to exhibit substantially enlarged stability regions which allows for them to solve pathwise stiff biological models orders of magnitude more efficiently than previous methods like SRIW1 and Euler-Maruyama. Secondly, these integrators include a stiffness estimator which allows for automatically switching between implicit and explicit schemes based on the current stiffness. In addition, adaptive L-stable strong order 1.5 implicit integrators for SDEs and stochastic differential algebraic equations (SDAEs) in mass-matrix form with additive noise are derived and are demonstrated as more efficient than the explicit methods on stiff chemical reaction networks by nearly 8x. Lastly, we developed an adaptive implicit-explicit (IMEX) integration method based off of a common method for diffusion-reaction-convection PDEs and show numerically that it can achieve strong order 1.5. These methods are benchmarked on a range of problems varying from non-stiff to extreme pathwise stiff and demonstrate speedups between 5x-6000x while showing computationally infeasibility of fixed time step integrators on many of these test equations.
MLJun 13, 2023
Differentiating Metropolis-Hastings to Optimize Intractable DensitiesGaurav Arya, Ruben Seyer, Frank Schäfer et al.
We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it. Our approach fuses recent advances in stochastic automatic differentiation with traditional Markov chain coupling schemes, providing an unbiased and low-variance gradient estimator. This allows us to apply gradient-based optimization to objectives expressed as expectations over intractable target densities. We demonstrate our approach by finding an ambiguous observation in a Gaussian mixture model and by maximizing the specific heat in an Ising model.
AISep 2, 2025
The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)Andrew Ferguson, Marisa LaFleur, Lars Ruthotto et al. · stanford
This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and snapshot of the MPS community's perspective, as of Spring/Summer 2025, in a rapidly developing field. The link between AI and MPS is becoming increasingly inextricable; now is a crucial moment to strengthen the link between AI and Science by pursuing a strategy that proactively and thoughtfully leverages the potential of AI for scientific discovery and optimizes opportunities to impact the development of AI by applying concepts from fundamental science. To achieve this, we propose activities and strategic priorities that: (1) enable AI+MPS research in both directions; (2) build up an interdisciplinary community of AI+MPS researchers; and (3) foster education and workforce development in AI for MPS researchers and students. We conclude with a summary of suggested priorities for funding agencies, educational institutions, and individual researchers to help position the MPS community to be a leader in, and take full advantage of, the transformative potential of AI+MPS.
LGJan 28, 2022
Continuous Deep Equilibrium Models: Training Neural ODEs faster by integrating them to InfinityAvik Pal, Alan Edelman, Christopher Rackauckas
Implicit models separate the definition of a layer from the description of its solution process. While implicit layers allow features such as depth to adapt to new scenarios and inputs automatically, this adaptivity makes its computational expense challenging to predict. In this manuscript, we increase the "implicitness" of the DEQ by redefining the method in terms of an infinite time neural ODE, which paradoxically decreases the training cost over a standard neural ODE by 2-4x. Additionally, we address the question: is there a way to simultaneously achieve the robustness of implicit layers while allowing the reduced computational expense of an explicit layer? To solve this, we develop Skip and Skip Reg. DEQ, an implicit-explicit (IMEX) layer that simultaneously trains an explicit prediction followed by an implicit correction. We show that training this explicit predictor is free and even decreases the training time by 1.11-3.19x. Together, this manuscript shows how bridging the dichotomy of implicit and explicit deep learning can combine the advantages of both techniques.
CLMay 9, 2021
High-performance symbolic-numerics via multiple dispatchShashi Gowda, Yingbo Ma, Alessandro Cheli et al.
As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. To this end, we developed Symbolics.jl, an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. In this work we detail an underlying abstract term interface which allows for speed without sacrificing generality. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system without changing the pre-existing term rewriters. We showcase how this can be used to optimize term construction and give a 113x acceleration on general symbolic transformations. Further, we show that such a generic API allows for complementary term-rewriting implementations. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers. We showcase an e-graph ruleset which minimizes the number of CPU cycles during expression evaluation, and demonstrate how it simplifies a real-world reaction-network simulation to halve the runtime. Additionally, we show a reaction-diffusion partial differential equation solver which is able to be automatically converted into symbolic expressions via multiple dispatch tracing, which is subsequently accelerated and parallelized to give a 157x simulation speedup. Together, this presents Symbolics.jl as a next-generation symbolic-numeric computing environment geared towards modeling and simulation.
LGMay 9, 2021
Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver HeuristicsAvik Pal, Yingbo Ma, Viral Shah et al.
Democratization of machine learning requires architectures that automatically adapt to new problems. Neural Differential Equations (NDEs) have emerged as a popular modeling framework by removing the need for ML practitioners to choose the number of layers in a recurrent model. While we can control the computational cost by choosing the number of layers in standard architectures, in NDEs the number of neural network evaluations for a forward pass can depend on the number of steps of the adaptive ODE solver. But, can we force the NDE to learn the version with the least steps while not increasing the training cost? Current strategies to overcome slow prediction require high order automatic differentiation, leading to significantly higher training time. We describe a novel regularization method that uses the internal cost heuristics of adaptive differential equation solvers combined with discrete adjoint sensitivities to guide the training process towards learning NDEs that are easier to solve. This approach opens up the blackbox numerical analysis behind the differential equation solver's algorithm and directly uses its local error estimates and stiffness heuristics as cheap and accurate cost estimates. We incorporate our method without any change in the underlying NDE framework and show that our method extends beyond Ordinary Differential Equations to accommodate Neural Stochastic Differential Equations. We demonstrate how our approach can halve the prediction time and, unlike other methods which can increase the training time by an order of magnitude, we demonstrate similar reduction in training times. Together this showcases how the knowledge embedded within state-of-the-art equation solvers can be used to enhance machine learning.
NAMar 29, 2021
Stiff Neural Ordinary Differential EquationsSuyong Kim, Weiqi Ji, Sili Deng et al.
Neural Ordinary Differential Equations (ODE) are a promising approach to learn dynamic models from time-series data in science and engineering applications. This work aims at learning Neural ODE for stiff systems, which are usually raised from chemical kinetic modeling in chemical and biological systems. We first show the challenges of learning neural ODE in the classical stiff ODE systems of Robertson's problem and propose techniques to mitigate the challenges associated with scale separations in stiff systems. We then present successful demonstrations in stiff systems of Robertson's problem and an air pollution problem. The demonstrations show that the usage of deep networks with rectified activations, proper scaling of the network outputs as well as loss functions, and stabilized gradient calculations are the key techniques enabling the learning of stiff neural ODE. The success of learning stiff neural ODE opens up possibilities of using neural ODEs in applications with widely varying time-scales, like chemical dynamics in energy conversion, environmental engineering, and the life sciences.
MTRL-SCINov 3, 2020
AutoMat: Accelerated Computational Electrochemical systems DiscoveryEmil Annevelink, Rachel Kurchin, Eric Muckley et al.
Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive first-principles, meso-scale and continuum simulations. We present an automated workflow, AutoMat, that accelerates these computational steps by introducing both automated input generation and management of simulations across scales from first principles to continuum device modeling. Furthermore, we show how to seamlessly integrate multi-fidelity predictions such as machine learning surrogates or automated robotic experiments "in-the-loop". The automated framework is implemented with design space search techniques to dramatically accelerate the overall materials discovery pipeline by implicitly learning design features that optimize device performance across several metrics. We discuss the benefits of AutoMat using examples in electrocatalysis and energy storage and highlight lessons learned.
LGJan 13, 2020
Universal Differential Equations for Scientific Machine LearningChristopher Rackauckas, Yingbo Ma, Julius Martensen et al.
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators.
SEJul 17, 2018
Confederated Modular Differential Equation APIs for Accelerated Algorithm Development and BenchmarkingChristopher Rackauckas, Qing Nie
Performant numerical solving of differential equations is required for large-scale scientific modeling. In this manuscript we focus on two questions: (1) how can researchers empirically verify theoretical advances and consistently compare methods in production software settings and (2) how can users (scientific domain experts) keep up with the state-of-the-art methods to select those which are most appropriate? Here we describe how the confederated modular API of DifferentialEquations.jl addresses these concerns. We detail the package-free API which allows numerical methods researchers to readily utilize and benchmark any compatible method directly in full-scale scientific applications. In addition, we describe how the complexity of the method choices is abstracted via a polyalgorithm. We show how scientific tooling built on top of DifferentialEquations.jl, such as packages for dynamical systems quantification and quantum optics simulation, both benefit from this structure and provide themselves as convenient benchmarking tools.