NAFeb 23, 2023
Backpropagation through Back Substitution with a BackslashAlan Edelman, Ekin Akyurek, Yuyang Wang · amazon-science, mit
We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators. We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.
LGMar 3, 2023
Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!Avik Pal, Alan Edelman, Chris Rackauckas
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
NAAug 9, 2018
Fast computation of the principal components of genotype matrices in JuliaJiahao Chen, Andreas Noack, Alan Edelman
Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a handful of large outliers. We also implement Golub-Kahan-Lanczos (GKL) bidiagonalization in the Julia programming language, providing thick restarting and a choice between full and partial reorthogonalization strategies to control numerical roundoff. Our implementation of GKL bidiagonalization is up to 36 times faster than software tools used commonly in genomics data analysis for computing principal components, such as EIGENSOFT and FlashPCA, which use dense LAPACK routines and randomized subspace iteration respectively.
27.0DCMar 19
Hierarchical Precision and Recursion for Accelerating Symmetric Linear Solves on MXUsVicki Carrica, Rabab Alomairy, Evelyne Ringoot et al.
Symmetric linear solves are fundamental to a wide range of scientific and engineering applications, from climate modeling and structural analysis to machine learning and optimization. These workloads often rely on Cholesky (POTRF) decomposition and its supporting operations, triangular solves (TRSM) and symmetric rank-k updates (SYRK), which together form the computational core for solving symmetric positive-definite systems. To accelerate these kernels, we present a portable, mixed-precision solver designed for Matrix Processing Units (MXUs), including NVIDIA Tensor Cores (H200) and AMD Matrix Cores (MI300X). Our algorithm builds on a nested recursive formulation in which Cholesky exposes parallelism through recursive decomposition of its TRSM and SYRK sub-problems. This structure yields a hierarchical recursion that maximizes GEMM throughput while enabling fine-grained control over numerical precision. We introduce a custom recursive data structure that assigns low-precision FP16 arithmetic to large off-diagonal blocks, while preserving high precision on diagonal blocks to ensure numerical stability. The solver is implemented in Julia, leveraging array programming, multiple dispatch, and dynamic type inference to enable seamless expression of mixed-precision computation. This design provides a high-level, hardware-agnostic interface while efficiently interfacing with low-level vendor libraries for backend portability. On H200, our recursive FP64 SYRK achieves a 14x speedup over cuBLAS, while mixed-precision delivers up to 27x speedup in SYRK and 5x in TRSM over full-precision baselines. This results in a 5x overall speedup for Cholesky versus cuSOLVER FP64, with 100x better accuracy than pure FP16 while retaining 88% of its peak speedup. Comparable performance and accuracy trends are observed on MI300X, demonstrating broad applicability across GPUs.
13.0NAMay 2
Sampling Pfaffian point processes and the symplectic Arnoldi methodAlan Edelman, Sungwoo Jeong, Simeon Schaub
We present an exact sampling algorithm for Pfaffian point processes based on a skew-symmetric analogue of the Cholesky factorization. This algorithm enables efficient sampling of a wide range of statistics arising in random matrix theory and combinatorics. For instance, we can sample eigenvalues of the orthogonal and symplectic ensembles ($β= 1,4$). In addition, we introduce a symplectic Arnoldi method for computing skew-orthogonal polynomials associated with a general weight function. This method can be used to efficiently construct the $2 \times 2$ matrix valued skew-symmetric kernels that arise in $β= 1,4$ polynomial ensembles. We illustrate our approach with several numerical examples and experiments, including the symmetric corner growth model, the finite-$N$ Gaussian (Hermite) orthogonal and symplectic ensembles, and the $β= 1,4$ Airy point processes and Tracy-Widom distributions.
LGFeb 25
ABM-UDE: Developing Surrogates for Epidemic Agent-Based Models via Scientific Machine LearningSharv Murgai, Utkarsh Utkarsh, Kyle C. Nguyen et al.
Agent-based epidemic models (ABMs) encode behavioral and policy heterogeneity but are too slow for nightly hospital planning. We develop county-ready surrogates that learn directly from exascale ABM trajectories using Universal Differential Equations (UDEs): mechanistic SEIR-family ODEs with a neural-parameterized contact rate $κ_φ(u,t)$ (no additive residual). Our contributions are threefold: we adapt multiple shooting and an observer-based prediction-error method (PEM) to stabilize identification of neural-augmented epidemiological dynamics across intervention-driven regime shifts; we enforce positivity and mass conservation and show the learned contact-rate parameterization yields a well-posed vector field; and we quantify accuracy, calibration, and compute against ABM ensembles and UDE baselines. On a representative ExaEpi scenario, PEM-UDE reduces mean MSE by 77% relative to single-shooting UDE (3.00 vs. 13.14) and by 20% relative to MS-UDE (3.75). Reliability improves in parallel: empirical coverage of ABM $10$-$90$% and $25$-$75$% bands rises from 0.68/0.43 (UDE) and 0.79/0.55 (MS-UDE) to 0.86/0.61 with PEM-UDE and 0.94/0.69 with MS+PEM-UDE, indicating calibrated uncertainty rather than overconfident fits. Inference runs in seconds on commodity CPUs (20-35 s per $\sim$90-day forecast), enabling nightly ''what-if'' sweeps on a laptop. Relative to a $\sim$100 CPU-hour ABM reference run, this yields $\sim10^{4}\times$ lower wall-clock per scenario. This closes the realism-cadence gap, supports threshold-aware decision-making (e.g., maintaining ICU occupancy $<75$%), preserves mechanistic interpretability, and enables calibrated, risk-aware scenario planning on standard institutional hardware. Beyond epidemics, the ABM$\to$UDE recipe provides a portable path to distill agent-based simulators into fast, trustworthy surrogates for other scientific domains.
43.2NAMar 15
The largest 5th pivot may be the root of a 61st degree polynomialJames Chen, Alan Edelman, John Urschel
This paper introduces a number of new techniques in the study of the famous question from numerical linear algebra: what is the largest possible growth factor when performing Gaussian elimination with complete pivoting? This question is highly complex, due to a complicated set of polynomial inequalities that need to be simultaneously satisfied. This paper introduces the JuMP + Groebner basis + discriminant polynomial approach as well as the use of interval arithmetic computations. Thus, we are introducing a marriage of numerical and exact mathematical computations. In 1988, Day and Peterson performed numerical optimization on $n=5$ with NPSOL and obtained a largest seen value of $4.1325...$. This same best value was reproduced by Gould with LANCELOT in 1991. We ran extensive comparable experiments with the modern software tool JuMP and also saw the same value $4.1325...$. While the combinatorial explosion of possibilities prevents us from knowing whether there may not be a larger maximum, we succeed in obtaining the exact mathematical value: the number $4.1325...$ is exactly the root of a 61st degree polynomial provided in this work, and is a maximum given the equality constraints seen by JuMP. In light of the numerics, we pose the conjecture that this lower bound is indeed the maximum. We also apply this technique to $n = 6$, $7$, and $8$. Furthermore, in 1969, an upper bound of $4\frac{17}{18}\approx 4.94$ was produced for the maximum possible growth for $n = 5$. We slightly lower this upper bound to $4.84$.
LGJun 4, 2025
Physics-Constrained Flow Matching: Sampling Generative Models with Hard ConstraintsUtkarsh Utkarsh, Pengfei Cai, Alan Edelman et al.
Deep generative models have recently been applied to physical systems governed by partial differential equations (PDEs), offering scalable simulation and uncertainty-aware inference. However, enforcing physical constraints, such as conservation laws (linear and nonlinear) and physical consistencies, remains challenging. Existing methods often rely on soft penalties or architectural biases that fail to guarantee hard constraints. In this work, we propose Physics-Constrained Flow Matching (PCFM), a zero-shot inference framework that enforces arbitrary nonlinear constraints in pretrained flow-based generative models. PCFM continuously guides the sampling process through physics-based corrections applied to intermediate solution states, while remaining aligned with the learned flow and satisfying physical constraints. Empirically, PCFM outperforms both unconstrained and constrained baselines on a range of PDEs, including those with shocks, discontinuities, and sharp features, while ensuring exact constraint satisfaction at the final solution. Our method provides a general framework for enforcing hard constraints in both scientific and general-purpose generative models, especially in applications where constraint satisfaction is essential.
HOJan 7, 2025
Matrix Calculus (for Machine Learning and Beyond)Paige Bright, Alan Edelman, Steven G. Johnson
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computational applications, such as large-scale optimization and machine learning, where derivatives must be re-imagined in order to be propagated through complicated calculations. The class also discusses efficiency concerns leading to "adjoint" or "reverse-mode" differentiation (a.k.a. "backpropagation"), and gives a gentle introduction to modern automatic differentiation (AD) techniques.
LGJul 4, 2025
Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural PopulationsAnthony G. Chesebro, David Hofmann, Vaibhav Dixit et al.
Discovering governing equations that describe complex chaotic systems remains a fundamental challenge in physics and neuroscience. Here, we introduce the PEM-UDE method, which combines the prediction-error method with universal differential equations to extract interpretable mathematical expressions from chaotic dynamical systems, even with limited or noisy observations. This approach succeeds where traditional techniques fail by smoothing optimization landscapes and removing the chaotic properties during the fitting process without distorting optimal parameters. We demonstrate its efficacy by recovering hidden states in the Rossler system and reconstructing dynamics from noise-corrupted electrical circuit data, where the correct functional form of the dynamics is recovered even when one of the observed time series is corrupted by noise 5x the magnitude of the true signal. We demonstrate that this method is capable of recovering the correct dynamics, whereas direct symbolic regression methods, such as SINDy, fail to do so with the given amount of data and noise. Importantly, when applied to neural populations, our method derives novel governing equations that respect biological constraints such as network sparsity - a constraint necessary for cortical information processing yet not captured in next-generation neural mass models - while preserving microscale neuronal parameters. These equations predict an emergent relationship between connection density and both oscillation frequency and synchrony in neural circuits. We validate these predictions using three intracranial electrode recording datasets from the medial entorhinal cortex, prefrontal cortex, and orbitofrontal cortex. Our work provides a pathway to develop mechanistic, multi-scale brain models that generalize across diverse neural architectures, bridging the gap between single-neuron dynamics and macroscale brain activity.
LGJan 28, 2022
Continuous Deep Equilibrium Models: Training Neural ODEs faster by integrating them to InfinityAvik Pal, Alan Edelman, Christopher Rackauckas
Implicit models separate the definition of a layer from the description of its solution process. While implicit layers allow features such as depth to adapt to new scenarios and inputs automatically, this adaptivity makes its computational expense challenging to predict. In this manuscript, we increase the "implicitness" of the DEQ by redefining the method in terms of an infinite time neural ODE, which paradoxically decreases the training cost over a standard neural ODE by 2-4x. Additionally, we address the question: is there a way to simultaneously achieve the robustness of implicit layers while allowing the reduced computational expense of an explicit layer? To solve this, we develop Skip and Skip Reg. DEQ, an implicit-explicit (IMEX) layer that simultaneously trains an explicit prediction followed by an implicit correction. We show that training this explicit predictor is free and even decreases the training time by 1.11-3.19x. Together, this manuscript shows how bridging the dichotomy of implicit and explicit deep learning can combine the advantages of both techniques.
CLMay 9, 2021
High-performance symbolic-numerics via multiple dispatchShashi Gowda, Yingbo Ma, Alessandro Cheli et al.
As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. To this end, we developed Symbolics.jl, an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. In this work we detail an underlying abstract term interface which allows for speed without sacrificing generality. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system without changing the pre-existing term rewriters. We showcase how this can be used to optimize term construction and give a 113x acceleration on general symbolic transformations. Further, we show that such a generic API allows for complementary term-rewriting implementations. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers. We showcase an e-graph ruleset which minimizes the number of CPU cycles during expression evaluation, and demonstrate how it simplifies a real-world reaction-network simulation to halve the runtime. Additionally, we show a reaction-diffusion partial differential equation solver which is able to be automatically converted into symbolic expressions via multiple dispatch tracing, which is subsequently accelerated and parallelized to give a 157x simulation speedup. Together, this presents Symbolics.jl as a next-generation symbolic-numeric computing environment geared towards modeling and simulation.
MTRL-SCINov 3, 2020
AutoMat: Accelerated Computational Electrochemical systems DiscoveryEmil Annevelink, Rachel Kurchin, Eric Muckley et al.
Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive first-principles, meso-scale and continuum simulations. We present an automated workflow, AutoMat, that accelerates these computational steps by introducing both automated input generation and management of simulations across scales from first principles to continuum device modeling. Furthermore, we show how to seamlessly integrate multi-fidelity predictions such as machine learning surrogates or automated robotic experiments "in-the-loop". The automated framework is implemented with design space search techniques to dramatically accelerate the overall materials discovery pipeline by implicitly learning design features that optimize device performance across several metrics. We discuss the benefits of AutoMat using examples in electrocatalysis and energy storage and highlight lessons learned.
LGOct 7, 2020
Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State NetworksRanjan Anantharaman, Yingbo Ma, Shashi Gowda et al.
Modern design, control, and optimization often requires simulation of highly nonlinear models, leading to prohibitive computational costs. These costs can be amortized by evaluating a cheap surrogate of the full model. Here we present a general data-driven method, the continuous-time echo state network (CTESN), for generating surrogates of nonlinear ordinary differential equations with dynamics at widely separated timescales. We empirically demonstrate near-constant time performance using our CTESNs on a physically motivated scalable model of a heating system whose full execution time increases exponentially, while maintaining relative error of within 0.2 %. We also show that our model captures fast transients as well as slow dynamics effectively, while other techniques such as physics informed neural networks have difficulties trying to train and predict the highly nonlinear behavior of these models.
LGJul 23, 2020
Signal Enhancement for Magnetic Navigation Challenge ProblemAlbert R. Gnadt, Joseph Belarge, Aaron Canciani et al.
Harnessing the magnetic field of the Earth for navigation has shown promise as a viable alternative to other navigation systems. A magnetic navigation system collects its own magnetic field data using a magnetometer and uses magnetic anomaly maps to determine the current location. The greatest challenge with magnetic navigation arises when the magnetic field measurements from the magnetometer encompass the magnetic field from not just the Earth, but also from the vehicle on which it is mounted. It is difficult to separate the Earth magnetic anomaly field, which is crucial for navigation, from the total magnetic field reading from the sensor. The purpose of this challenge problem is to decouple the Earth and aircraft magnetic signals in order to derive a clean signal from which to perform magnetic navigation. Baseline testing on the dataset has shown that the Earth magnetic field can be extracted from the total magnetic field using machine learning (ML). The challenge is to remove the aircraft magnetic field from the total magnetic field using a trained model. This challenge offers an opportunity to construct an effective model for removing the aircraft magnetic field from the dataset by using a scientific machine learning (SciML) approach comprised of an ML algorithm integrated with the physics of magnetic navigation.
LGJan 13, 2020
Universal Differential Equations for Scientific Machine LearningChristopher Rackauckas, Yingbo Ma, Julius Martensen et al.
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators.
PLJul 17, 2019
A Differentiable Programming System to Bridge Machine Learning and Scientific ComputingMike Innes, Alan Edelman, Keno Fischer et al.
Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs.
CVDec 28, 2016
Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in JuliaAlexander Amini, Berthold Horn, Alan Edelman
Convolutions have long been regarded as fundamental to applied mathematics, physics and engineering. Their mathematical elegance allows for common tasks such as numerical differentiation to be computed efficiently on large data sets. Efficient computation of convolutions is critical to artificial intelligence in real-time applications, like machine vision, where convolutions must be continuously and efficiently computed on tens to hundreds of kilobytes per second. In this paper, we explore how convolutions are used in fundamental machine vision applications. We present an accelerated n-dimensional convolution package in the high performance computing language, Julia, and demonstrate its efficacy in solving the time to contact problem for machine vision. Results are measured against synthetically generated videos and quantitatively assessed according to their mean squared error from the ground truth. We achieve over an order of magnitude decrease in compute time and allocated memory for comparable machine vision applications. All code is packaged and integrated into the official Julia Package Manager to be used in various other scenarios.
PROct 31, 2007
The polynomial method for random matricesN. Raj Rao, Alan Edelman
We define a class of "algebraic" random matrices. These are random matrices for which the Stieltjes transform of the limiting eigenvalue distribution function is algebraic, i.e., it satisfies a (bivariate) polynomial equation. The Wigner and Wishart matrices whose limiting eigenvalue distributions are given by the semi-circle law and the Marcenko-Pastur law are special cases. Algebraicity of a random matrix sequence is shown to act as a certificate of the computability of the limiting eigenvalue density function. The limiting moments of algebraic random matrix sequences, when they exist, are shown to satisfy a finite depth linear recursion so that they may often be efficiently enumerated in closed form. In this article, we develop the mathematics of the polynomial method which allows us to describe the class of algebraic matrices by its generators and map the constructive approach we employ when proving algebraicity into a software implementation that is available for download in the form of the RMTool random matrix "calculator" package. Our characterization of the closure of algebraic probability distributions under free additive and multiplicative convolution operations allows us to simultaneously establish a framework for computational (non-commutative) "free probability" theory. We hope that the tools developed allow researchers to finally harness the power of the infinite random matrix theory.
PRMay 16, 2005
The Efficient Evaluation of the Hypergeometric Function of a Matrix ArgumentPlamen Koev, Alan Edelman
We present new algorithms that efficiently approximate the hypergeometric function of a matrix argument through its expansion as a series of Jack functions. Our algorithms exploit the combinatorial properties of the Jack function, and have complexity that is only linear in the size of the matrix.
MATH-PHJan 27, 2005
Numerical Methods for Eigenvalue Distributions of Random MatricesAlan Edelman, Per-Olof Persson
We present efficient numerical techniques for calculation of eigenvalue distributions of random matrices in the beta-ensembles. We compute histograms using direct simulations on very large matrices, by using tridiagonal matrices with appropriate simplifications. The distributions are also obtained by numerical solution of the Painleve II and V equations with high accuracy. For the spacings we show a technique based on the Prolate matrix and Richardson extrapolation, and we compare the distributions with the zeros of the Riemann zeta function.
COMP-PHJun 19, 1998
The Geometry of Algorithms with Orthogonality ConstraintsAlan Edelman, T. A. Arias, Steven T. Smith
In this paper we develop new Newton and conjugate gradient algorithms on the Grassmann and Stiefel manifolds. These manifolds represent the constraints that arise in such areas as the symmetric eigenvalue problem, nonlinear eigenvalue problems, electronic structures computations, and signal processing. In addition to the new algorithms, we show how the geometrical framework gives penetrating new insights allowing us to create, understand, and compare algorithms. The theory proposed here provides a taxonomy for numerical linear algebra algorithms that provide a top level mathematical view of previously unrelated algorithms. It is our hope that developers of new algorithms and perturbation theories will benefit from the theory, methods, and examples in this paper.