Alex Townsend

h-index8

33papers

1,468citations

Novelty57%

AI Score56

Ranked #22,693 of 201,326 authors (top 11%)#14 in NA (top 2%)

33 Papers

NAApr 2

Linear Systems and Eigenvalue Problems: Open Questions from a Simons Workshop

Noah Amsel, Yves Baumann, Paul Beckman et al. · berkeley

This document presents a series of open questions arising in matrix computations, i.e., the numerical solution of linear algebra problems. It is a result of working groups at the workshop Linear Systems and Eigenvalue Problems, which was organized at the Simons Institute for the Theory of Computing program on Complexity and Linear Algebra in Fall 2025. The complexity and numerical solution of linear algebra problems is a crosscutting area between theoretical computer science and numerical analysis. The value of the particular problem formulations here is that they were produced via discussions between researchers from both groups. The open questions are organized in five categories: iterative solvers for linear systems, eigenvalue computation, low-rank approximation, randomized sketching, and other areas including tensors, quantum systems, and matrix functions.

NAAug 15, 2012

A fast and well-conditioned spectral method

Sheehan Olver, Alex Townsend

A spectral method is developed for the direct solution of linear ordinary differential equations with variable coefficients. The method leads to matrices which are almost banded, and a numerical solver is presented that takes O(m^2n) operations, where m is the number of Chebyshev points needed to resolve the coefficients of the differential operator and n is the number of Chebyshev coefficients needed to resolve the solution to the differential equation. We prove stability of the method by relating it to a diagonally preconditioned system which has a bounded condition number, in a suitable norm. For Dirichlet boundary conditions, this implies stability in the standard 2-norm. An adaptive QR factorization is developed to efficiently solve the resulting linear system and automatically choose the optimal number of Chebyshev coefficients needed to represent the solution. The resulting algorithm can efficiently and reliably solve for solutions that require as many as a million unknowns.

NASep 29, 2016

On the singular values of matrices with displacement structure

Bernhard Beckermann, Alex Townsend

Matrices with displacement structure such as Pick, Vandermonde, and Hankel matrices appear in a diverse range of applications. In this paper, we use an extremal problem involving rational functions to derive explicit bounds on the singular values of such matrices. For example, we show that the $k$th singular value of a real $n\times n$ positive definite Hankel matrix, $H_n$, is bounded by $Cρ^{-k/\log n}\|H\|_2$ with explicitly given constants $C>0$ and $ρ>1$, where $\|H_n\|_2$ is the spectral norm. This means that a real $n\times n$ positive definite Hankel matrix can be approximated, up to an accuracy of $ε\|H_n\|_2$ with $0<ε<1$, by a rank $\mathcal{O}(\log n\log(1/ε) )$ matrix. Analogous results are obtained for Pick, Cauchy, real Vandermonde, Löwner, and certain Krylov matrices.

NAMay 31

Transpose-free linear algebra

Diana Halikias, Michiel E. Hochstenbach, Alex Townsend

We study the limitations of matrix-free algorithms that access a matrix $A$ only through forward matrix-vector products (matvecs) $x \mapsto Ax$, without access to the transpose $A^\top$ or its action. This setting arises naturally in operator learning, inverse problems, and matrix-free PDE solvers, where adjoint evaluations may be unavailable or prohibitively expensive. We show that the lack of transpose access creates severe and sometimes insurmountable theoretical barriers. For Krylov methods, we prove that the sequence of projected operator norms produced by Arnoldi iteration can follow any prescribed nondecreasing curve, showing that forward matvecs alone provide essentially no reliable information about the spectral norm. For several core problems, including least squares, norm estimation, column subset selection, and local maximum volume, we establish non-identifiability results; distinct matrices can generate identical forward-query transcripts while having fundamentally different solutions. We also prove quantitative lower bounds on the number of forward matvecs required for approximation tasks. In particular, any algorithm that computes a near-optimal rank-$k$ approximation must use at least $n$ queries, and estimating the Frobenius norm to relative accuracy $\eps$ requires $Ω(\eps^{-2})$ queries when $n$ is sufficiently large, matching the complexity of Hutchinson-type estimators up to constants. Although some problems remain solvable without transpose access, the transpose-free setting is fundamentally more limited in both identifiability and efficiency.

NAApr 27, 2016

Fast polynomial transforms based on Toeplitz and Hankel matrices

Alex Townsend, Marcus Webb, Sheehan Olver

Many standard conversion matrices between coefficients in classical orthogonal polynomial expansions can be decomposed using diagonally-scaled Hadamard products involving Toeplitz and Hankel matrices. This allows us to derive $\smash{\mathcal{O}(N(\log N)^2)}$ algorithms, based on the fast Fourier transform, for converting coefficients of a degree $N$ polynomial in one polynomial basis to coefficients in another. Numerical results show that this approach is competitive with state-of-the-art techniques, requires no precomputational cost, can be implemented in a handful of lines of code, and is easily adapted to extended precision arithmetic.

NAApr 2, 2016

Computing with functions in spherical and polar geometries I. The sphere

Alex Townsend, Heather Wilber, Grady B. Wright

A collection of algorithms is described for numerically computing with smooth functions defined on the unit sphere. Functions are approximated to essentially machine precision by using a structure-preserving iterative variant of Gaussian elimination together with the double Fourier sphere method. We show that this procedure allows for stable differentiation, reduces the oversampling of functions near the poles, and converges for certain analytic functions. Operations such as function evaluation, differentiation, and integration are particularly efficient and can be computed by essentially one-dimensional algorithms. A highlight is an optimal complexity direct solver for Poisson's equation on the sphere using a spectral method. Without parallelization, we solve Poisson's equation with $100$ million degrees of freedom in one minute on a standard laptop. Numerical results are presented throughout. In a companion paper (part II) we extend the ideas presented here to computing with functions on the disk.

NAOct 6, 2016

Vector spaces of linearizations for matrix polynomials: a bivariate polynomial approach

Yuji Nakatsukasa, Vanni Noferini, Alex Townsend

We revisit the landmark paper [D. S. Mackey, N. Mackey, C. Mehl, and V. Mehrmann, SIAM J. Matrix Anal. Appl., 28 (2006), pp.~971--1004] and, by viewing matrices as coefficients for bivariate polynomials, we provide concise proofs for key properties of linearizations for matrix polynomials. We also show that every pencil in the double ansatz space is intrinsically connected to a Bézout matrix, which we use to prove the eigenvalue exclusion theorem. In addition our exposition allows for any polynomial basis and for any field. The new viewpoint also leads to new results. We generalize the double ansatz space by exploiting its algebraic interpretation as a space of Bézout pencils to derive new linearizations with potential applications in the theory of structured matrix polynomials. Moreover, we analyze the conditioning of double ansatz space linearizations in the important practical case of a Chebyshev basis.

ITMay 31, 2016

Stable extrapolation of analytic functions

Laurent Demanet, Alex Townsend

This paper examines the problem of extrapolation of an analytic function for $x > 1$ given perturbed samples from an equally spaced grid on $[-1,1]$. Mathematical folklore states that extrapolation is in general hopelessly ill-conditioned, but we show that a more precise statement carries an interesting nuance. For a function $f$ on $[-1,1]$ that is analytic in a Bernstein ellipse with parameter $ρ> 1$, and for a uniform perturbation level $ε$ on the function samples, we construct an asymptotically best extrapolant $e(x)$ as a least squares polynomial approximant of degree $M^*$ given explicitly. We show that the extrapolant $e(x)$ converges to $f(x)$ pointwise in the interval $I_ρ\in[1,(ρ+ρ^{-1})/2)$ as $ε\to 0$, at a rate given by a $x$-dependent fractional power of $ε$. More precisely, for each $x \in I_ρ$ we have \[ |f(x) - e(x)| = \mathcal{O}\left( ε^{-\log r(x) / \logρ} \right), \qquad\qquad r(x) = \frac{x+\sqrt{x^2-1}}ρ, \] up to log factors, provided that the oversampling conditioning is satisfied. That is, \[ M^* \leq \frac{1}{2} \sqrt{N}, \] which is known to be needed from approximation theory. In short, extrapolation enjoys a weak form of stability, up to a fraction of the characteristic smoothness length. The number of function samples, $N+1$, does not bear on the size of the extrapolation error provided that it obeys the oversampling condition. We also show that one cannot construct an asymptotically more accurate extrapolant from $N+1$ equally spaced samples than $e(x)$, using any other linear or nonlinear procedure. The proofs involve original statements on the stability of polynomial approximation in the Chebyshev basis from equally spaced samples and these are expected to be of independent interest.

NAJan 17, 2017

A nonuniform fast Fourier transform based on low rank approximation

Diego Ruiz-Antolin, Alex Townsend

By viewing the nonuniform discrete Fourier transform (NUDFT) as a perturbed version of a uniform discrete Fourier transform, we propose a fast, stable, and simple algorithm for computing the NUDFT that costs $\mathcal{O}(N\log N\log(1/ε)/\log\!\log(1/ε))$ operations based on the fast Fourier transform, where $N$ is the size of the transform and $0<ε<1$ is a working precision. Our key observation is that a NUDFT and DFT matrix divided entry-by-entry is often well-approximated by a low rank matrix, allowing us to express a NUDFT matrix as a sum of diagonally-scaled DFT matrices. Our algorithm is simple to implement, automatically adapts to any working precision, and is competitive with state-of-the-art algorithms. In the fully uniform case, our algorithm is essentially the FFT. We also describe quasi-optimal algorithms for the inverse NUDFT and two-dimensional NUDFTs.

NAOct 30, 2017

Fast Poisson solvers for spectral methods

Daniel Fortunato, Alex Townsend

Poisson's equation is the canonical elliptic partial differential equation. While there exist fast Poisson solvers for finite difference and finite element methods, fast Poisson solvers for spectral methods have remained elusive. Here, we derive spectral methods for solving Poisson's equation on a square, cylinder, solid sphere, and cube that have an optimal complexity (up to polylogarithmic terms) in terms of the degrees of freedom required to represent the solution. Whereas FFT-based fast Poisson solvers exploit structured eigenvectors of finite difference matrices, our solver exploits a separated spectra property that holds for our spectral discretizations. Without parallelization, we can solve Poisson's equation on a square with 100 million degrees of freedom in under two minutes on a standard laptop.

NAMar 27, 2017

Computing with functions in spherical and polar geometries II. The disk

Heather Wilber, Alex Townsend, Grady Wright

A collection of algorithms is described for numerically computing with smooth functions defined on the unit disk. Low rank approximations to functions in polar geometries are formed by synthesizing the disk analogue of the double Fourier sphere method with a structure-preserving variant of iterative Gaussian elimination that is shown to converge geometrically for certain analytic functions. This adaptive procedure is near-optimal in its sampling strategy, producing approximants that are stable for differentiation and facilitate the use of FFT-based algorithms in both variables. The low rank form of the approximants is especially useful for operations such as integration and differentiation, reducing them to essentially 1D procedures, and it is also exploited to formulate a new fast disk Poisson solver that computes low rank approximations to solutions. This work complements a companion paper (Part I) on computing with functions on the surface of the unit sphere.

LGFeb 24, 2023

Elliptic PDE learning is provably data-efficient

Nicolas Boullé, Diana Halikias, Alex Townsend

PDE learning is an emerging field that combines physics and machine learning to recover unknown physical systems from experimental data. While deep learning models traditionally require copious amounts of training data, recent PDE learning techniques achieve spectacular results with limited data availability. Still, these results are empirical. Our work provides theoretical guarantees on the number of input-output training pairs required in PDE learning. Specifically, we exploit randomized numerical linear algebra and PDE theory to derive a provably data-efficient algorithm that recovers solution operators of 3D uniformly elliptic PDEs from input-output data and achieves an exponential convergence rate of the error with respect to the size of the training dataset with an exceptionally high probability of success.

NAApr 27, 2022

Learning Green's functions associated with time-dependent partial differential equations

Nicolas Boullé, Seick Kim, Tianyi Shi et al.

Neural operators are a popular technique in scientific machine learning to learn a mathematical model of the behavior of unknown physical systems from data. Neural operators are especially useful to learn solution operators associated with partial differential equations (PDEs) from pairs of forcing functions and solutions when numerical solvers are not available or the underlying physics is poorly understood. In this work, we attempt to provide theoretical foundations to understand the amount of training data needed to learn time-dependent PDEs. Given input-output pairs from a parabolic PDE in any spatial dimension $n\geq 1$, we derive the first theoretically rigorous scheme for learning the associated solution operator, which takes the form of a convolution with a Green's function $G$. Until now, rigorously learning Green's functions associated with time-dependent PDEs has been a major challenge in the field of scientific machine learning because $G$ may not be square-integrable when $n>1$, and time-dependent PDEs have transient dynamics. By combining the hierarchical low-rank structure of $G$ together with randomized numerical linear algebra, we construct an approximant to $G$ that achieves a relative error of $\smash{\mathcal{O}(Γ_ε^{-1/2}ε)}$ in the $L^1$-norm with high probability by using at most $\smash{\mathcal{O}(ε^{-\frac{n+2}{2}}\log(1/ε))}$ input-output training pairs, where $Γ_ε$ is a measure of the quality of the training dataset for learning $G$, and $ε>0$ is sufficiently small.

DSAug 21, 2023

Beyond expectations: Residual Dynamic Mode Decomposition and Variance for Stochastic Dynamical Systems

Matthew J. Colbrook, Qin Li, Ryan V. Raut et al.

Koopman operators linearize nonlinear dynamical systems, making their spectral information of crucial interest. Numerous algorithms have been developed to approximate these spectral properties, and Dynamic Mode Decomposition (DMD) stands out as the poster child of projection-based methods. Although the Koopman operator itself is linear, the fact that it acts in an infinite-dimensional space of observables poses challenges. These include spurious modes, essential spectra, and the verification of Koopman mode decompositions. While recent work has addressed these challenges for deterministic systems, there remains a notable gap in verified DMD methods for stochastic systems, where the Koopman operator measures the expectation of observables. We show that it is necessary to go beyond expectations to address these issues. By incorporating variance into the Koopman framework, we address these challenges. Through an additional DMD-type matrix, we approximate the sum of a squared residual and a variance term, each of which can be approximated individually using batched snapshot data. This allows verified computation of the spectral properties of stochastic Koopman operators, controlling the projection error. We also introduce the concept of variance-pseudospectra to gauge statistical coherency. Finally, we present a suite of convergence results for the spectral information of stochastic Koopman operators. Our study concludes with practical applications using both simulated and experimental data. In neural recordings from awake mice, we demonstrate how variance-pseudospectra can reveal physiologically significant information unavailable to standard expectation-based dynamical models.

NAJan 11, 2018

On the singular values of matrices with high displacement rank

Alex Townsend, Heather Wilber

We introduce a new ADI-based low rank solver for $AX-XB=F$, where $F$ has rapidly decaying singular values. Our approach results in both theoretical and practical gains, including (1) the derivation of new bounds on singular values for classes of matrices with high displacement rank, (2) a practical algorithm for solving certain Lyapunov and Sylvester matrix equations with high rank right-hand sides, and (3) a collection of low rank Poisson solvers that achieve spectral accuracy and optimal computational complexity.

NAFeb 13, 2019

A sparse spectral method on triangles

Sheehan Olver, Alex Townsend, Geoff Vasil

In this paper, we demonstrate that many of the computational tools for univariate orthogonal polynomials have analogues for a family of bivariate orthogonal polynomials on the triangle, including Clenshaw's algorithm and sparse differentiation operators. This allows us to derive a practical spectral method for solving linear partial differential equations on triangles with sparse discretizations. We can thereby rapidly solve partial differential equations using polynomials with degrees in the thousands, resulting in sparse discretizations with as many as several million degrees of freedom.

NAFeb 21, 2016

Gaussian elimination corrects pivoting mistakes

Alex Townsend

Gaussian elimination (GE) is the archetypal direct algorithm for solving linear systems of equations and this has been its primary application for thousands of years. In the last decade, GE has found another major use as an iterative algorithm for low rank approximation. In this setting, GE is often employed with complete pivoting and designed to allow for non-optimal pivoting, i.e., pivoting mistakes, that could render GE numerically unstable when implemented in floating point arithmetic. While it may appear that pivoting mistakes could accumulate and lead to a large growth factor, we show that later GE steps correct earlier pivoting mistakes, even while more are being made. In short, GE is very robust to non-optimal pivots, allowing for its iterative variant to flourish.

NAApr 19, 2018

Continuous analogues of Krylov methods for differential operators

Marc Aurèle Gilles, Alex Townsend

Analogues of the conjugate gradient method, MINRES, and GMRES are derived for solving boundary value problems (BVPs) involving second-order differential operators. Two challenges arise: imposing the boundary conditions on the solution while building up a Krylov subspace, and guaranteeing convergence of the Krylov-based method on unbounded operators. Our approach employs projection operators to guarantee that the boundary conditions are satisfied, and we develop an operator preconditioner that ensures that an approximate solution is computed after a finite number of iterations. The developed Krylov methods are practical iterative BVP solvers that are particularly efficient when a fast operator-function product is available.

NAApr 11

Oblivious Subspace Injection Is Not Enough for Relative Error

Alex Townsend, Chris Wang

Oblivious subspace injection (OSI) was introduced by Camaño, Epperly, Meyer, and Tropp in 2025 as a much weaker sketching property than oblivious subspace embedding (OSE) that still yields constant-factor guarantees for randomized low-rank approximation and sketch-and-solve least-squares regression. At the Simons Institute in Berkeley during a workshop in October 2025, it was asked whether OSIs also imply relative error bounds rather than just constant-factor guarantees. We show that, from a theoretical standpoint, OSI alone does not yield OSE-style relative-error guarantees whose failure probability is controlled solely by the OSI failure parameter, even though OSI sketches often perform extremely well in practice. We provide counterexamples showing this for sketch-and-solve least squares and for randomized SVD in the Frobenius norm. The missing ingredient from a sketch satisfying only OSI is upper control on the optimal residual or tail component, and when one ensures the sketch has this additional property, a near-relative-error bound is recovered. We also show that there is a natural $\ell_p$ analogue of OSI giving constant-factor sketch-and-solve bounds.

LGMay 28, 2022

Tuning Frequency Bias in Neural Network Training with Nonuniform Data

Annan Yu, Yunan Yang, Alex Townsend

Small generalization errors of over-parameterized neural networks (NNs) can be partially explained by the frequency biasing phenomenon, where gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals. Using the Neural Tangent Kernel (NTK), one can provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities. Since most training data sets are not drawn from such distributions, we use the NTK model and a data-dependent quadrature rule to theoretically quantify the frequency biasing of NN training given fully nonuniform data. By replacing the loss function with a carefully selected Sobolev norm, we can further amplify, dampen, counterbalance, or reverse the intrinsic frequency biasing in NN training.

NADec 22, 2023

A Mathematical Guide to Operator Learning

Nicolas Boullé, Alex Townsend

Operator learning aims to discover properties of an underlying dynamical system or partial differential equation (PDE) from data. Here, we present a step-by-step guide to operator learning. We explain the types of problems and PDEs amenable to operator learning, discuss various neural network architectures, and explain how to employ numerical PDE solvers effectively. We also give advice on how to create and manage training data and conduct optimization. We offer intuition behind the various neural network architectures employed in operator learning by motivating them from the point-of-view of numerical linear algebra.

LGFeb 12

Rational Neural Networks have Expressivity Advantages

Maosen Tang, Alex Townsend

We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $\varepsilon>0$, we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only $\mathrm{poly}(\log\log(1/\varepsilon))$ overhead in size, while the converse provably requires $Ω(\log(1/\varepsilon))$ parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate seamlessly into standard architectures and training pipelines, allowing rationals to match or outperform fixed activations under identical architectures and optimizers.

NAJan 31, 2024

Operator learning without the adjoint

Nicolas Boullé, Diana Halikias, Samuel E. Otto et al.

There is a mystery at the heart of operator learning: how can one recover a non-self-adjoint operator from data without probing the adjoint? Current practical approaches suggest that one can accurately recover an operator while only using data generated by the forward action of the operator without access to the adjoint. However, naively, it seems essential to sample the action of the adjoint. In this paper, we partially explain this mystery by proving that without querying the adjoint, one can approximate a family of non-self-adjoint infinite-dimensional compact operators via projection onto a Fourier basis. We then apply the result to recovering Green's functions of elliptic partial differential operators and derive an adjoint-free sample complexity bound. While existing theory justifies low sample complexity in operator learning, ours is the first adjoint-free analysis that attempts to close the gap between theory and practice.

NADec 29, 2023

Operator learning for hyperbolic partial differential equations

Christopher Wang, Alex Townsend

We construct the first rigorously justified probabilistic algorithm for recovering the solution operator of a hyperbolic partial differential equation (PDE) in two variables from input-output training pairs. The primary challenge of recovering the solution operator of hyperbolic PDEs is the presence of characteristics, along which the associated Green's function is discontinuous. Therefore, a central component of our algorithm is a rank detection scheme that identifies the approximate location of the characteristics. By combining the randomized singular value decomposition with an adaptive hierarchical partition of the domain, we construct an approximant to the solution operator using $O(Ψ_ε^{-1}ε^{-7}\log(Ξ_ε^{-1}ε^{-1}))$ input-output pairs with relative error $O(Ξ_ε^{-1}ε)$ in the operator norm as $ε\to0$, with high probability. Here, $Ψ_ε$ represents the existence of degenerate singular values of the solution operator, and $Ξ_ε$ measures the quality of the training data. Our assumptions on the regularity of the coefficients of the hyperbolic PDE are relatively weak given that hyperbolic PDEs do not have the ``instantaneous smoothing effect'' of elliptic and parabolic PDEs, and our recovery rate improves as the regularity of the coefficients increases.

NANov 29, 2021

Rigorous data-driven computation of spectral properties of Koopman operators for dynamical systems

Matthew J. Colbrook, Alex Townsend

Koopman operators are infinite-dimensional operators that globally linearize nonlinear dynamical systems, making their spectral information valuable for understanding dynamics. However, Koopman operators can have continuous spectra and infinite-dimensional invariant subspaces, making computing their spectral information a considerable challenge. This paper describes data-driven algorithms with rigorous convergence guarantees for computing spectral information of Koopman operators from trajectory data. We introduce residual dynamic mode decomposition (ResDMD), which provides the first scheme for computing the spectra and pseudospectra of general Koopman operators from snapshot data without spectral pollution. Using the resolvent operator and ResDMD, we compute smoothed approximations of spectral measures associated with general measure-preserving dynamical systems. We prove explicit convergence theorems for our algorithms, which can achieve high-order convergence even for chaotic systems when computing the density of the continuous spectrum and the discrete spectrum. Since our algorithms come with error control, ResDMD allows aposteri verification of spectral quantities, Koopman mode decompositions, and learned dictionaries. We demonstrate our algorithms on the tent map, circle rotations, Gauss iterated map, nonlinear pendulum, double pendulum, and Lorenz system. Finally, we provide kernelized variants of our algorithms for dynamical systems with a high-dimensional state space. This allows us to compute the spectral measure associated with the dynamics of a protein molecule with a 20,046-dimensional state space and compute nonlinear Koopman modes with error bounds for turbulent flow past aerofoils with Reynolds number $>10^5$ that has a 295,122-dimensional state space.

LGSep 23, 2021

Arbitrary-Depth Universal Approximation Theorems for Operator Neural Networks

Annan Yu, Chloé Becquey, Diana Halikias et al.

The standard Universal Approximation Theorem for operator neural networks (NNs) holds for arbitrary width and bounded depth. Here, we prove that operator NNs of bounded width and arbitrary depth are universal approximators for continuous nonlinear operators. In our main result, we prove that for non-polynomial activation functions that are continuously differentiable at a point with a nonzero derivative, one can construct an operator NN of width five, whose inputs are real numbers with finite decimal representations, that is arbitrarily close to any given continuous nonlinear operator. We derive an analogous result for non-affine polynomial activation functions. We also show that depth has theoretical advantages by constructing operator ReLU NNs of depth $2k^3+8$ and constant width that cannot be well-approximated by any operator ReLU NN of depth $k$, unless its width is exponential in $k$.

NAMay 27, 2021

A generalization of the randomized singular value decomposition

Nicolas Boullé, Alex Townsend

The randomized singular value decomposition (SVD) is a popular and effective algorithm for computing a near-best rank $k$ approximation of a matrix $A$ using matrix-vector products with standard Gaussian vectors. Here, we generalize the randomized SVD to multivariate Gaussian vectors, allowing one to incorporate prior knowledge of $A$ into the algorithm. This enables us to explore the continuous analogue of the randomized SVD for Hilbert--Schmidt (HS) operators using operator-function products with functions drawn from a Gaussian process (GP). We then construct a new covariance kernel for GPs, based on weighted Jacobi polynomials, which allows us to rapidly sample the GP and control the smoothness of the randomly generated functions. Numerical examples on matrices and HS operators demonstrate the applicability of the algorithm.

LGMay 1, 2021

Data-driven discovery of Green's functions with human-understandable deep learning

Nicolas Boullé, Christopher J. Earls, Alex Townsend

There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. To do this, we develop a novel data-driven approach for creating a human-machine partnership to accelerate scientific discovery. By collecting physical system responses under excitations drawn from a Gaussian process, we train rational neural networks to learn Green's functions of hidden linear partial differential equations. These functions reveal human-understandable properties and features, such as linear conservation laws and symmetries, along with shock and singularity locations, boundary effects, and dominant modes. We illustrate the technique on several examples and capture a range of physics, including advection-diffusion, viscous shocks, and Stokes flow in a lid-driven cavity.

NAJan 31, 2021

Learning elliptic partial differential equations with randomized linear algebra

Nicolas Boullé, Alex Townsend

Given input-output pairs of an elliptic partial differential equation (PDE) in three dimensions, we derive the first theoretically-rigorous scheme for learning the associated Green's function $G$. By exploiting the hierarchical low-rank structure of $G$, we show that one can construct an approximant to $G$ that converges almost surely and achieves a relative error of $\mathcal{O}(Γ_ε^{-1/2}\log^3(1/ε)ε)$ using at most $\mathcal{O}(ε^{-6}\log^4(1/ε))$ input-output training pairs with high probability, for any $0<ε<1$. The quantity $0<Γ_ε\leq 1$ characterizes the quality of the training dataset. Along the way, we extend the randomized singular value decomposition algorithm for learning matrices to Hilbert--Schmidt operators and characterize the quality of covariance kernels for PDE learning.

NAOct 29, 2020

Over-parametrized neural networks as under-determined linear systems

Austin R. Benson, Anil Damle, Alex Townsend

We draw connections between simple neural networks and under-determined linear systems to comprehensively explore several interesting theoretical questions in the study of neural networks. First, we emphatically show that it is unsurprising such networks can achieve zero training loss. More specifically, we provide lower bounds on the width of a single hidden layer neural network such that only training the last linear layer suffices to reach zero training loss. Our lower bounds grow more slowly with data set size than existing work that trains the hidden layer weights. Second, we show that kernels typically associated with the ReLU activation function have fundamental flaws -- there are simple data sets where it is impossible for widely studied bias-free models to achieve zero training loss irrespective of how the parameters are chosen or trained. Lastly, our analysis of gradient descent clearly illustrates how spectral properties of certain matrices impact both the early iteration and long-term training behavior. We propose new activation functions that avoid the pitfalls of ReLU in that they admit zero training loss solutions for any set of distinct data points and experimentally exhibit favorable spectral properties.

NEApr 4, 2020

Rational neural networks

Nicolas Boullé, Yuji Nakatsukasa, Alex Townsend

We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more efficiently than ReLU networks with exponentially smaller depth. The flexibility and smoothness of rational activation functions make them an attractive alternative to ReLU, as we demonstrate with numerical experiments.

NASep 22, 2018

Chebyshev approximation and the global geometry of sloppy models

Katherine N. Quinn, Heather Wilber, Alex Townsend et al.

Sloppy models are complex nonlinear models with outcomes that are significantly affected by only a small subset of parameter combinations. Despite forming an important universality class and arising frequently in practice, formal and systematic explanations of sloppiness are lacking. By unifying geometric interpretations of sloppiness with Chebyshev approximation theory, we offer such an explanation, and show how sloppiness can be described explicitly in terms of model smoothness. Our approach results in universal bounds on model predictions for classes of smooth models, and our bounds capture global geometric features that are intrinsic to their model manifolds. We illustrate these ideas using three disparate models: exponential decay, reaction rates from an enzyme-catalysed chemical reaction, and an epidemiology model of an infected population.

LGMay 21, 2017

Why are Big Data Matrices Approximately Low Rank?

Madeleine Udell, Alex Townsend

Matrices of (approximate) low rank are pervasive in data science, appearing in recommender systems, movie preferences, topic models, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention on explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an $m \times n$ matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as $\mathcal O(\log(m + n))$. Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.