Nicolas Boullé

NA
h-index49
22papers
616citations
Novelty54%
AI Score58

22 Papers

99.4NAJun 2
Physics-guided correction for operator learning under model misspecification

Lei Ma, Nicolas Boullé, Yu-Sen Yang et al.

Physics-informed operator learning provides an efficient framework for approximating solution operators of partial differential equations by combining observational data with governing physical laws. However, most existing methods implicitly assume that the prescribed governing equation is accurate. This assumption may fail in practical applications, where model simplifications, missing physical effects, parameter drift, or incomplete constitutive relations can lead to model misspecification. In this work, we propose a physics-guided operator correction framework for learning solution operators under misspecified governing equations. At the operator level, the target mapping is decomposed into a prior operator induced by an approximate physical model and a learnable correction operator that accounts for the remaining discrepancy. Although the formulation is architecture independent, we realize it using a serial DeepONet architecture, where the first DeepONet provides a prior prediction and the second DeepONet learns an additive correction conditioned on both the input function and the prior prediction. The learned correction is incorporated into the physics residual and trained together with data-consistency constraints, allowing the model to retain useful physical structure while adapting to inaccurate governing equations. Numerical experiments on diffusion-reaction, Burgers, cavity flow, and hyperelastic problems show that the proposed method substantially reduces errors induced by misspecified physics. Additional tests under sparse and noisy observations further demonstrate the robustness of the framework and its ability to provide informative uncertainty estimates through deep ensembles.

68.1LGJun 3
Deep Embedded Multiplicative DMD for Algebra-Preserving Koopman Learning

Kelan Gray, Finlay Brown, Nicolas Boullé et al.

Koopman theory turns nonlinear dynamics into a linear spectral problem. In computation, however, everything depends on a hard finite-dimensional choice: the observables must be expressive, nearly invariant under the dynamics, and, ideally, compatible with composition. Deep Koopman methods learn flexible coordinates, whereas structure-preserving methods enforce operator identities on fixed dictionaries. We combine these ideas by introducing Deep Embedded Multiplicative Dynamic Mode Decomposition (DeepMDMD), a method that learns a latent space and a partition of it, while enforcing the Koopman product rule as an exact algebraic constraint. Training alternates between an exact multiplicative operator update and a differentiable latent-clustering step that promotes Koopman closure. The result is a finite transition map on learned latent cells. Its nonzero spectrum lies on the unit circle, its dictionary is shaped by the dynamics rather than by ambient geometry, and forecasts are made in latent coordinates before being decoded to physical space. Across Hamiltonian, chaotic, and fluid examples, DeepMDMD learns dictionaries that are far more compact and dynamically coherent than those produced by geometric MDMD partitions. It reduces spectral pollution, reveals richer continuous-spectrum structure, and gives stable forecasts under severe noise. In high-dimensional flows, including a 158,624-dimensional cylinder wake and a noisy $Re=20,000$ lid-driven cavity, it preserves coherent structures and long-time spectral statistics where state-space MDMD fails. These results suggest a practical rule for Koopman learning: learn the coordinates, constrain the algebra.

LGFeb 24, 2023
Elliptic PDE learning is provably data-efficient

Nicolas Boullé, Diana Halikias, Alex Townsend

PDE learning is an emerging field that combines physics and machine learning to recover unknown physical systems from experimental data. While deep learning models traditionally require copious amounts of training data, recent PDE learning techniques achieve spectacular results with limited data availability. Still, these results are empirical. Our work provides theoretical guarantees on the number of input-output training pairs required in PDE learning. Specifically, we exploit randomized numerical linear algebra and PDE theory to derive a provably data-efficient algorithm that recovers solution operators of 3D uniformly elliptic PDEs from input-output data and achieves an exponential convergence rate of the error with respect to the size of the training dataset with an exceptionally high probability of success.

NAApr 27, 2022
Learning Green's functions associated with time-dependent partial differential equations

Nicolas Boullé, Seick Kim, Tianyi Shi et al.

Neural operators are a popular technique in scientific machine learning to learn a mathematical model of the behavior of unknown physical systems from data. Neural operators are especially useful to learn solution operators associated with partial differential equations (PDEs) from pairs of forcing functions and solutions when numerical solvers are not available or the underlying physics is poorly understood. In this work, we attempt to provide theoretical foundations to understand the amount of training data needed to learn time-dependent PDEs. Given input-output pairs from a parabolic PDE in any spatial dimension $n\geq 1$, we derive the first theoretically rigorous scheme for learning the associated solution operator, which takes the form of a convolution with a Green's function $G$. Until now, rigorously learning Green's functions associated with time-dependent PDEs has been a major challenge in the field of scientific machine learning because $G$ may not be square-integrable when $n>1$, and time-dependent PDEs have transient dynamics. By combining the hierarchical low-rank structure of $G$ together with randomized numerical linear algebra, we construct an approximant to $G$ that achieves a relative error of $\smash{\mathcal{O}(Γ_ε^{-1/2}ε)}$ in the $L^1$-norm with high probability by using at most $\smash{\mathcal{O}(ε^{-\frac{n+2}{2}}\log(1/ε))}$ input-output training pairs, where $Γ_ε$ is a measure of the quality of the training dataset for learning $G$, and $ε>0$ is sufficiently small.

CLJan 23Code
Jacobian Scopes: token-level causal attributions in LLMs

Toni J. B. Liu, Baran Zadeoğlu, Nicolas Boullé et al.

Large language models (LLMs) make next-token predictions based on clues present in their context, such as semantic descriptions and in-context examples. Yet, elucidating which prior tokens most strongly influence a given prediction remains challenging due to the proliferation of layers and attention heads in modern architectures. We propose Jacobian Scopes, a suite of gradient-based, token-level causal attribution methods for interpreting LLM predictions. By analyzing the linearized relations of final hidden state with respect to inputs, Jacobian Scopes quantify how input tokens influence a model's prediction. We introduce three variants - Semantic, Fisher, and Temperature Scopes - which respectively target sensitivity of specific logits, the full predictive distribution, and model confidence (inverse temperature). Through case studies spanning instruction understanding, translation and in-context learning (ICL), we uncover interesting findings, such as when Jacobian Scopes point to implicit political biases. We believe that our proposed methods also shed light on recently debated mechanisms underlying in-context time-series forecasting. Our code and interactive demonstrations are publicly available at https://github.com/AntonioLiu97/JacobianScopes.

MLFeb 3, 2025Code
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Oussama Zekri, Nicolas Boullé

Discrete diffusion models have recently gained significant attention due to their ability to process complex discrete structures for language modeling. However, fine-tuning these models with policy gradient methods, as is commonly done in Reinforcement Learning from Human Feedback (RLHF), remains a challenging task. We propose an efficient, broadly applicable, and theoretically justified policy gradient algorithm, called Score Entropy Policy Optimization (SEPO), for fine-tuning discrete diffusion models over non-differentiable rewards. Our numerical experiments across several discrete generative tasks demonstrate the scalability and efficiency of our method. Our code is available at https://github.com/ozekri/SEPO.

NADec 22, 2023
A Mathematical Guide to Operator Learning

Nicolas Boullé, Alex Townsend

Operator learning aims to discover properties of an underlying dynamical system or partial differential equation (PDE) from data. Here, we present a step-by-step guide to operator learning. We explain the types of problems and PDEs amenable to operator learning, discuss various neural network architectures, and explain how to employ numerical PDE solvers effectively. We also give advice on how to create and manage training data and conduct optimization. We offer intuition behind the various neural network architectures employed in operator learning by motivating them from the point-of-view of numerical linear algebra.

NAOct 28, 2022
Data-driven discovery of Green's functions

Nicolas Boullé

Discovering hidden partial differential equations (PDEs) and operators from data is an important topic at the frontier between machine learning and numerical analysis. This doctoral thesis introduces theoretical results and deep learning algorithms to learn Green's functions associated with linear partial differential equations and rigorously justify PDE learning techniques. A theoretically rigorous algorithm is derived to obtain a learning rate, which characterizes the amount of training data needed to approximately learn Green's functions associated with elliptic PDEs. The construction connects the fields of PDE learning and numerical linear algebra by extending the randomized singular value decomposition to non-standard Gaussian vectors and Hilbert--Schmidt operators, and exploiting the low-rank hierarchical structure of Green's functions using hierarchical matrices. Rational neural networks (NNs) are introduced and consist of neural networks with trainable rational activation functions. The highly compositional structure of these networks, combined with rational approximation theory, implies that rational functions have higher approximation power than standard activation functions. In addition, rational NNs may have poles and take arbitrarily large values, which is ideal for approximating functions with singularities such as Green's functions. Finally, theoretical results on Green's functions and rational NNs are combined to design a human-understandable deep learning method for discovering Green's functions from data. This approach complements state-of-the-art PDE learning techniques, as a wide range of physics can be captured from the learned Green's functions such as dominant modes, symmetries, and singularity locations.

LGFeb 1, 2024
LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Toni J. B. Liu, Nicolas Boullé, Raphaël Sarfati et al.

Pretrained large language models (LLMs) are surprisingly effective at performing zero-shot tasks, including time-series forecasting. However, understanding the mechanisms behind such capabilities remains highly challenging due to the complexity of the models. We study LLMs' ability to extrapolate the behavior of dynamical systems whose evolution is governed by principles of physical interest. Our results show that LLaMA 2, a language model trained primarily on texts, achieves accurate predictions of dynamical system time series without fine-tuning or prompt engineering. Moreover, the accuracy of the learned physical rules increases with the length of the input context window, revealing an in-context version of neural scaling law. Along the way, we present a flexible and efficient algorithm for extracting probability density functions of multi-digit numbers directly from LLMs.

DSMay 8, 2024
Multiplicative Dynamic Mode Decomposition

Nicolas Boullé, Matthew J. Colbrook

Koopman operators are infinite-dimensional operators that linearize nonlinear dynamical systems, facilitating the study of their spectral properties and enabling the prediction of the time evolution of observable quantities. Recent methods have aimed to approximate Koopman operators while preserving key structures. However, approximating Koopman operators typically requires a dictionary of observables to capture the system's behavior in a finite-dimensional subspace. The selection of these functions is often heuristic, may result in the loss of spectral information, and can severely complicate structure preservation. This paper introduces Multiplicative Dynamic Mode Decomposition (MultDMD), which enforces the multiplicative structure inherent in the Koopman operator within its finite-dimensional approximation. Leveraging this multiplicative property, we guide the selection of observables and define a constrained optimization problem for the matrix approximation, which can be efficiently solved. MultDMD presents a structured approach to finite-dimensional approximations and can more accurately reflect the spectral properties of the Koopman operator. We elaborate on the theoretical framework of MultDMD, detailing its formulation, optimization strategy, and convergence properties. The efficacy of MultDMD is demonstrated through several examples, including the nonlinear pendulum, the Lorenz system, and fluid dynamics data, where we demonstrate its remarkable robustness to noise.

NAJan 31, 2024
Operator learning without the adjoint

Nicolas Boullé, Diana Halikias, Samuel E. Otto et al.

There is a mystery at the heart of operator learning: how can one recover a non-self-adjoint operator from data without probing the adjoint? Current practical approaches suggest that one can accurately recover an operator while only using data generated by the forward action of the operator without access to the adjoint. However, naively, it seems essential to sample the action of the adjoint. In this paper, we partially explain this mystery by proving that without querying the adjoint, one can approximate a family of non-self-adjoint infinite-dimensional compact operators via projection onto a Fourier basis. We then apply the result to recovering Green's functions of elliptic partial differential operators and derive an adjoint-free sample complexity bound. While existing theory justifies low sample complexity in operator learning, ours is the first adjoint-free analysis that attempts to close the gap between theory and practice.

NAJun 18, 2025
Convergent Methods for Koopman Operators on Reproducing Kernel Hilbert Spaces

Nicolas Boullé, Matthew J. Colbrook, Gustav Conradie

Data-driven spectral analysis of Koopman operators is a powerful tool for understanding numerous real-world dynamical systems, from neuronal activity to variations in sea surface temperature. The Koopman operator acts on a function space and is most commonly studied on the space of square-integrable functions. However, defining it on a suitable reproducing kernel Hilbert space (RKHS) offers numerous practical advantages, including pointwise predictions with error bounds, improved spectral properties that facilitate computations, and more efficient algorithms, particularly in high dimensions. We introduce the first general, provably convergent, data-driven algorithms for computing spectral properties of Koopman and Perron--Frobenius operators on RKHSs. These methods efficiently compute spectra and pseudospectra with error control and spectral measures while exploiting the RKHS structure to avoid the large-data limits required in the $L^2$ settings. The function space is determined by a user-specified kernel, eliminating the need for quadrature-based sampling as in $L^2$ and enabling greater flexibility with finite, externally provided datasets. Using the Solvability Complexity Index hierarchy, we construct adversarial dynamical systems for these problems to show that no algorithm can succeed in fewer limits, thereby proving the optimality of our algorithms. Notably, this impossibility extends to randomized algorithms and datasets. We demonstrate the effectiveness of our algorithms on challenging, high-dimensional datasets arising from real-world measurements and high-fidelity numerical simulations, including turbulent channel flow, molecular dynamics of a binding protein, Antarctic sea ice concentration, and Northern Hemisphere sea surface height. The algorithms are publicly available in the software package $\texttt{SpecRKHS}$.

NAJan 6, 2024
On the Convergence of Hermitian Dynamic Mode Decomposition

Nicolas Boullé, Matthew J. Colbrook

We study the convergence of Hermitian Dynamic Mode Decomposition (DMD) to the spectral properties of self-adjoint Koopman operators. Hermitian DMD is a data-driven method that approximates the Koopman operator associated with an unknown nonlinear dynamical system, using discrete-time snapshots. This approach preserves the self-adjointness of the operator in its finite-dimensional approximations. \rev{We prove that, under suitably broad conditions, the spectral measures corresponding to the eigenvalues and eigenfunctions computed by Hermitian DMD converge to those of the underlying Koopman operator}. This result also applies to skew-Hermitian systems (after multiplication by $i$), applicable to generators of continuous-time measure-preserving systems. Along the way, we establish a general theorem on the convergence of spectral measures for finite sections of self-adjoint operators, including those that are unbounded, which is of independent interest to the wider spectral community. We numerically demonstrate our results by applying them to two-dimensional Schrödinger equations.

LGMay 19, 2025
Multi-Level Monte Carlo Training of Neural Operators

James Rowbottom, Stefania Fresca, Pietro Lio et al.

Operator learning is a rapidly growing field that aims to approximate nonlinear operators related to partial differential equations (PDEs) using neural operators. These rely on discretization of input and output functions and are, usually, expensive to train for large-scale problems at high-resolution. Motivated by this, we present a Multi-Level Monte Carlo (MLMC) approach to train neural operators by leveraging a hierarchy of resolutions of function dicretization. Our framework relies on using gradient corrections from fewer samples of fine-resolution data to decrease the computational cost of training while maintaining a high level accuracy. The proposed MLMC training procedure can be applied to any architecture accepting multi-resolution data. Our numerical experiments on a range of state-of-the-art models and test-cases demonstrate improved computational efficiency compared to traditional single-resolution training approaches, and highlight the existence of a Pareto curve between accuracy and computational time, related to the number of samples per resolution.

LGSep 8, 2025
Text-Trained LLMs Can Zero-Shot Extrapolate PDE Dynamics

Jiajun Bao, Nicolas Boullé, Toni J. B. Liu et al.

Large language models (LLMs) have demonstrated emergent in-context learning (ICL) capabilities across a range of tasks, including zero-shot time-series forecasting. We show that text-trained foundation models can accurately extrapolate spatiotemporal dynamics from discretized partial differential equation (PDE) solutions without fine-tuning or natural language prompting. Predictive accuracy improves with longer temporal contexts but degrades at finer spatial discretizations. In multi-step rollouts, where the model recursively predicts future spatial states over multiple time steps, errors grow algebraically with the time horizon, reminiscent of global error accumulation in classical finite-difference solvers. We interpret these trends as in-context neural scaling laws, where prediction quality varies predictably with both context length and output length. To better understand how LLMs are able to internally process PDE solutions so as to accurately roll them out, we analyze token-level output distributions and uncover a consistent ICL progression: beginning with syntactic pattern imitation, transitioning through an exploratory high-entropy phase, and culminating in confident, numerically grounded predictions.

CLMay 19, 2025
What's in a prompt? Language models encode literary style in prompt embeddings

Raphaël Sarfati, Haley Moller, Toni J. B. Liu et al.

Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers. We use literary pieces to show that information about intangible, rather than factual, aspects of the prompt are contained in deep representations. We observe that short excerpts (10 - 100 tokens) from different novels separate in the latent space independently from what next-token prediction they converge towards. Ensembles from books from the same authors are much more entangled than across authors, suggesting that embeddings encode stylistic features. This geometry of style may have applications for authorship attribution and literary analysis, but most importantly reveals the sophistication of information processing and compression accomplished by language models.

FLU-DYNNov 30, 2024
Operator learning regularization for macroscopic permeability prediction in dual-scale flow problem

Christina Runkel, Sinan Xiao, Nicolas Boullé et al.

Liquid composites moulding is an important manufacturing technology for fibre reinforced composites, due to its cost-effectiveness. Challenges lie in the optimisation of the process due to the lack of understanding of key characteristic of textile fabrics - permeability. The problem of computing the permeability coefficient can be modelled as the well-known Stokes-Brinkman equation, which introduces a heterogeneous parameter $β$ distinguishing macropore regions and fibre-bundle regions. In the present work, we train a Fourier neural operator to learn the nonlinear map from the heterogeneous coefficient $β$ to the velocity field $u$, and recover the corresponding macroscopic permeability $K$. This is a challenging inverse problem since both the input and output fields span several order of magnitudes, we introduce different regularization techniques for the loss function and perform a quantitative comparison between them.

NAMay 27, 2021
A generalization of the randomized singular value decomposition

Nicolas Boullé, Alex Townsend

The randomized singular value decomposition (SVD) is a popular and effective algorithm for computing a near-best rank $k$ approximation of a matrix $A$ using matrix-vector products with standard Gaussian vectors. Here, we generalize the randomized SVD to multivariate Gaussian vectors, allowing one to incorporate prior knowledge of $A$ into the algorithm. This enables us to explore the continuous analogue of the randomized SVD for Hilbert--Schmidt (HS) operators using operator-function products with functions drawn from a Gaussian process (GP). We then construct a new covariance kernel for GPs, based on weighted Jacobi polynomials, which allows us to rapidly sample the GP and control the smoothness of the randomly generated functions. Numerical examples on matrices and HS operators demonstrate the applicability of the algorithm.

LGMay 1, 2021
Data-driven discovery of Green's functions with human-understandable deep learning

Nicolas Boullé, Christopher J. Earls, Alex Townsend

There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. To do this, we develop a novel data-driven approach for creating a human-machine partnership to accelerate scientific discovery. By collecting physical system responses under excitations drawn from a Gaussian process, we train rational neural networks to learn Green's functions of hidden linear partial differential equations. These functions reveal human-understandable properties and features, such as linear conservation laws and symmetries, along with shock and singularity locations, boundary effects, and dominant modes. We illustrate the technique on several examples and capture a range of physics, including advection-diffusion, viscous shocks, and Stokes flow in a lid-driven cavity.

NAJan 31, 2021
Learning elliptic partial differential equations with randomized linear algebra

Nicolas Boullé, Alex Townsend

Given input-output pairs of an elliptic partial differential equation (PDE) in three dimensions, we derive the first theoretically-rigorous scheme for learning the associated Green's function $G$. By exploiting the hierarchical low-rank structure of $G$, we show that one can construct an approximant to $G$ that converges almost surely and achieves a relative error of $\mathcal{O}(Γ_ε^{-1/2}\log^3(1/ε)ε)$ using at most $\mathcal{O}(ε^{-6}\log^4(1/ε))$ input-output training pairs with high probability, for any $0<ε<1$. The quantity $0<Γ_ε\leq 1$ characterizes the quality of the training dataset. Along the way, we extend the randomized singular value decomposition algorithm for learning matrices to Hilbert--Schmidt operators and characterize the quality of covariance kernels for PDE learning.

NEApr 4, 2020
Rational neural networks

Nicolas Boullé, Yuji Nakatsukasa, Alex Townsend

We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more efficiently than ReLU networks with exponentially smaller depth. The flexibility and smoothness of rational activation functions make them an attractive alternative to ReLU, as we demonstrate with numerical experiments.

SPJul 26, 2019
Classification of chaotic time series with deep learning

Nicolas Boullé, Vassilios Dallas, Yuji Nakatsukasa et al.

We use standard deep neural networks to classify univariate time series generated by discrete and continuous dynamical systems based on their chaotic or non-chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynamical system with a basic or low-dimensional phase space and then use these networks to classify univariate time series of a dynamical system with more intricate or high-dimensional phase space. We illustrate this generalisation approach using the logistic map, the sine-circle map, the Lorenz system, and the Kuramoto--Sivashinsky equation. We observe that a convolutional neural network without batch normalization layers outperforms state-of-the-art neural networks for time series classification and is able to generalise and classify time series as chaotic or not with high accuracy.