Anil Damle

NA
15papers
339citations
Novelty54%
AI Score43

15 Papers

OCSep 21, 2014
Linear Hamilton Jacobi Bellman Equations in High Dimensions

Matanya B. Horowitz, Anil Damle, Joel W. Burdick

The Hamilton Jacobi Bellman Equation (HJB) provides the globally optimal solution to large classes of control problems. Unfortunately, this generality comes at a price, the calculation of such solutions is typically intractible for systems with more than moderate state space size due to the curse of dimensionality. This work combines recent results in the structure of the HJB, and its reduction to a linear Partial Differential Equation (PDE), with methods based on low rank tensor representations, known as a separated representations, to address the curse of dimensionality. The result is an algorithm to solve optimal control problems which scales linearly with the number of states in a system, and is applicable to systems that are nonlinear with stochastic forcing in finite-horizon, average cost, and first-exit settings. The method is demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with system dimension two, six, and twelve respectively.

56.6NAApr 2
Linear Systems and Eigenvalue Problems: Open Questions from a Simons Workshop

Noah Amsel, Yves Baumann, Paul Beckman et al. · berkeley

This document presents a series of open questions arising in matrix computations, i.e., the numerical solution of linear algebra problems. It is a result of working groups at the workshop Linear Systems and Eigenvalue Problems, which was organized at the Simons Institute for the Theory of Computing program on Complexity and Linear Algebra in Fall 2025. The complexity and numerical solution of linear algebra problems is a crosscutting area between theoretical computer science and numerical analysis. The value of the particular problem formulations here is that they were produced via discussions between researchers from both groups. The open questions are organized in five categories: iterative solvers for linear systems, eigenvalue computation, low-rank approximation, randomized sketching, and other areas including tensors, quantum systems, and matrix functions.

NAJan 6, 2017
A recursive skeletonization factorization based on strong admissibility

Victor Minden, Kenneth L. Ho, Anil Damle et al.

We introduce the strong recursive skeletonization factorization (RS-S), a new approximate matrix factorization based on recursive skeletonization for solving discretizations of linear integral equations associated with elliptic partial differential equations in two and three dimensions (and other matrices with similar hierarchical rank structure). Unlike previous skeletonization-based factorizations, RS-S uses a simple modification of skeletonization, strong skeletonization, which compresses only far-field interactions. This leads to an approximate factorization in the form of a product of many block unit-triangular matrices that may be used as a preconditioner or moderate-accuracy direct solver, with dramatically reduced rank growth. We further combine the strong skeletonization procedure with alternating near-field compression to obtain the hybrid recursive skeletonization factorization (RS-WS), a modification of RS-S that exhibits reduced storage cost in many settings. Under suitable rank assumptions both RS-S and RS-WS exhibit linear computational complexity, which we demonstrate with a number of numerical examples.

NANov 2, 2015
A technique for updating hierarchical skeletonization-based factorizations of integral operators

Victor Minden, Anil Damle, Kenneth L. Ho et al.

We present a method for updating certain hierarchical factorizations for solving linear integral equations with elliptic kernels. In particular, given a factorization corresponding to some initial geometry or material parameters, we can locally perturb the geometry or coefficients and update the initial factorization to reflect this change with asymptotic complexity that is polylogarithmic in the total number of unknowns and linear in the number of perturbed unknowns. We apply our method to the recursive skeletonization factorization and hierarchical interpolative factorization and demonstrate scaling results for a number of different 2D problem setups.

COMP-PHJan 25, 2018
Variational formulation for Wannier functions with entangled band structure

Anil Damle, Antoine Levitt, Lin Lin

Wannier functions provide a localized representation of spectral subspaces of periodic Hamiltonians, and play an important role for interpreting and accelerating Hartree-Fock and Kohn-Sham density functional theory calculations in quantum physics and chemistry. For systems with isolated band structure, the existence of exponentially localized Wannier functions and numerical algorithms for finding them are well studied. In contrast, for systems with entangled band structure, Wannier functions must be generalized to span a subspace larger than the spectral subspace of interest to achieve favorable spatial locality. In this setting, little is known about the theoretical properties of these Wannier functions, and few algorithms can find them robustly. We develop a variational formulation to compute these generalized maximally localized Wannier functions. When paired with an initial guess based on the selected columns of the density matrix (SCDM) method, our method can robustly find Wannier functions for systems with entangled band structure. We formulate the problem as a constrained nonlinear optimization problem, and show how the widely used disentanglement procedure can be interpreted as a splitting method to approximately solve this problem. We demonstrate the performance of our method using real materials including silicon, copper, and aluminum. To examine more precisely the localization properties of Wannier functions, we study the free electron gas in one and two dimensions, where we show that the maximally-localized Wannier functions only decay algebraically. We also explain using a one dimensional example how to modify them to obtain super-algebraic decay.

MLMay 31, 2022
Communication-efficient distributed eigenspace estimation with arbitrary node failures

Vasileios Charisopoulos, Anil Damle

We develop an eigenspace estimation algorithm for distributed environments with arbitrary node failures, where a subset of computing nodes can return structurally valid but otherwise arbitrarily chosen responses. Notably, this setting encompasses several important scenarios that arise in distributed computing and data-collection environments such as silent/soft errors, outliers or corrupted data at certain nodes, and adversarial responses. Our estimator builds upon and matches the performance of a recently proposed non-robust estimator up to an additive $\tilde{O}(σ\sqrtα)$ error, where $σ^2$ is the variance of the existing estimator and $α$ is the fraction of corrupted nodes.

LGFeb 8, 2022Code
Linear Time Kernel Matrix Approximation via Hyperspherical Harmonics

John Paul Ryan, Anil Damle

We propose a new technique for constructing low-rank approximations of matrices that arise in kernel methods for machine learning. Our approach pairs a novel automatically constructed analytic expansion of the underlying kernel function with a data-dependent compression step to further optimize the approximation. This procedure works in linear time and is applicable to any isotropic kernel. Moreover, our method accepts the desired error tolerance as input, in contrast to prevalent methods which accept the rank as input. Experimental results show our approach compares favorably to the commonly used Nystrom method with respect to both accuracy for a given rank and computational time for a given accuracy across a variety of kernels, dimensions, and datasets. Notably, in many of these problem settings our approach produces near-optimal low-rank approximations. We provide an efficient open-source implementation of our new technique to complement our theoretical developments and experimental results.

LGJul 30, 2021
Model Preserving Compression for Neural Networks

Jerry Chee, Megan Renz, Anil Damle et al.

After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning. No existing compression methods simultaneously satisfy these criteria $\unicode{x2014}$ we introduce a principled approach that does by leveraging interpolative decompositions. Our approach simultaneously selects and eliminates channels (analogously, neurons), then constructs an interpolation matrix that propagates a correction into the next layer, preserving the network's structure. Consequently, our method achieves good performance even without fine tuning and admits theoretical analysis. Our theoretical generalization bound for a one layer network lends itself naturally to a heuristic that allows our method to automatically choose per-layer sizes for deep networks. We demonstrate the efficacy of our approach with strong empirical performance on a variety of tasks, models, and datasets $\unicode{x2014}$ from simple one-hidden-layer networks to deep networks on ImageNet.

LGJun 8, 2021
The Fast Kernel Transform

John Paul Ryan, Sebastian Ament, Carla P. Gomes et al.

Kernel methods are a highly effective and widely used collection of modern machine learning algorithms. A fundamental limitation of virtually all such methods are computations involving the kernel matrix that naively scale quadratically (e.g., constructing the kernel matrix and matrix-vector multiplication) or cubically (solving linear systems) with the size of the data set $N.$ We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications (MVMs) for datasets in moderate dimensions with quasilinear complexity. Typically, analytically grounded fast multiplication methods require specialized development for specific kernels. In contrast, our scheme is based on auto-differentiation and automated symbolic computations that leverage the analytical structure of the underlying kernel. This allows the FKT to be easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and physically motivated Green's functions, including those of the Laplace and Helmholtz equations. Furthermore, the FKT maintains a high, quantifiable, and controllable level of accuracy -- properties that many acceleration methods lack. We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks and by applying it to scale the stochastic neighborhood embedding (t-SNE) and Gaussian processes to large real-world data sets.

NAOct 29, 2020
Over-parametrized neural networks as under-determined linear systems

Austin R. Benson, Anil Damle, Alex Townsend

We draw connections between simple neural networks and under-determined linear systems to comprehensively explore several interesting theoretical questions in the study of neural networks. First, we emphatically show that it is unsurprising such networks can achieve zero training loss. More specifically, we provide lower bounds on the width of a single hidden layer neural network such that only training the last linear layer suffices to reach zero training loss. Our lower bounds grow more slowly with data set size than existing work that trains the hidden layer weights. Second, we show that kernels typically associated with the ReLU activation function have fundamental flaws -- there are simple data sets where it is impossible for widely studied bias-free models to achieve zero training loss irrespective of how the parameters are chosen or trained. Lastly, our analysis of gradient descent clearly illustrates how spectral properties of certain matrices impact both the early iteration and long-term training behavior. We propose new activation functions that avoid the pitfalls of ReLU in that they admit zero training loss solutions for any set of distinct data points and experimentally exhibit favorable spectral properties.

MLSep 5, 2020
Communication-efficient distributed eigenspace estimation

Vasileios Charisopoulos, Austin R. Benson, Anil Damle

Distributed computing is a standard way to scale up machine learning and data science algorithms to process large amounts of data. In such settings, avoiding communication amongst machines is paramount for achieving high performance. Rather than distribute the computation of existing algorithms, a common practice for avoiding communication is to compute local solutions or parameter estimates on each machine and then combine the results; in many convex optimization problems, even simple averaging of local solutions can work well. However, these schemes do not work when the local solutions are not unique. Spectral methods are a collection of such problems, where solutions are orthonormal bases of the leading invariant subspace of an associated data matrix, which are only unique up to rotation and reflections. Here, we develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix. Our algorithm uses a novel alignment scheme that minimizes the Procrustean distance between local solutions and a reference solution, and only requires a single round of communication. For the important case of principal component analysis (PCA), we show that our algorithm achieves a similar error rate to that of a centralized estimator. We present numerical experiments demonstrating the efficacy of our proposed algorithm for distributed PCA, as well as other problems where solutions exhibit rotational symmetry, such as node embeddings for graph data and spectral initialization for quadratic sensing.

LGJun 19, 2020
Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization

Geoff Pleiss, Martin Jankowiak, David Eriksson et al.

Matrix square roots and their inverses arise frequently in machine learning, e.g., when sampling from high-dimensional Gaussians $\mathcal{N}(\mathbf 0, \mathbf K)$ or whitening a vector $\mathbf b$ against covariance matrix $\mathbf K$. While existing methods typically require $O(N^3)$ computation, we introduce a highly-efficient quadratic-time algorithm for computing $\mathbf K^{1/2} \mathbf b$, $\mathbf K^{-1/2} \mathbf b$, and their derivatives through matrix-vector multiplication (MVMs). Our method combines Krylov subspace methods with a rational approximation and typically achieves $4$ decimal places of accuracy with fewer than $100$ MVMs. Moreover, the backward pass requires little additional computation. We demonstrate our method's applicability on matrices as large as $50,\!000 \times 50,\!000$ - well beyond traditional methods - with little approximation error. Applying this increased scalability to variational Gaussian processes, Bayesian optimization, and Gibbs sampling results in more powerful models with higher accuracy.

NAFeb 19, 2020
Entrywise convergence of iterative methods for eigenproblems

Vasileios Charisopoulos, Austin R. Benson, Anil Damle

Several problems in machine learning, statistics, and other fields rely on computing eigenvectors. For large scale problems, the computation of these eigenvectors is typically performed via iterative schemes such as subspace iteration or Krylov methods. While there is classical and comprehensive analysis for subspace convergence guarantees with respect to the spectral norm, in many modern applications other notions of subspace distance are more appropriate. Recent theoretical work has focused on perturbations of subspaces measured in the $\ell_{2 \to \infty}$ norm, but does not consider the actual computation of eigenvectors. Here we address the convergence of subspace iteration when distances are measured in the $\ell_{2 \to \infty}$ norm and provide deterministic bounds. We complement our analysis with a practical stopping criterion and demonstrate its applicability via numerical experiments. Our results show that one can get comparable performance on downstream tasks while requiring fewer iterations, thereby saving substantial computational time.

MESep 7, 2017
Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations

Victor Minden, Anil Damle, Kenneth L. Ho et al.

Maximum likelihood estimation for parameter-fitting given observations from a Gaussian process in space is a computationally-demanding task that restricts the use of such methods to moderately-sized datasets. We present a framework for unstructured observations in two spatial dimensions that allows for evaluation of the log-likelihood and its gradient (i.e., the score equations) in $\tilde O(n^{3/2})$ time under certain assumptions, where $n$ is the number of observations. Our method relies on the skeletonization procedure described by Martinsson & Rokhlin in the form of the recursive skeletonization factorization of Ho & Ying. Combining this with an adaptation of the matrix peeling algorithm of Lin et al. for constructing $\mathcal{H}$-matrix representations of black-box operators, we obtain a framework that can be used in the context of any first-order optimization routine to quickly and accurately compute maximum-likelihood estimates.

NAAug 20, 2017
Computing localized representations of the kohn-sham subspace via randomization and refinement

Anil Damle, Lin Lin, Lexing Ying

Localized representation of the Kohn-Sham subspace plays an important role in quantum chemistry and materials science. The recently developed selected columns of the density matrix (SCDM) method [J. Chem. Theory Comput. 11, 1463, 2015] is a simple and robust procedure for finding a localized representation of a set of Kohn-Sham orbitals from an insulating system. The SCDM method allows the direct construction of a well conditioned (or even orthonormal) and localized basis for the Kohn-Sham subspace. The SCDM algorithm avoids the use of an optimization procedure and does not depend on any adjustable parameters. The most computationally expensive step of the SCDM method is a column pivoted QR factorization that identifies the important columns for constructing the localized basis set. In this paper, we develop a two stage approximate column selection strategy to find the important columns at much lower computational cost. We demonstrate the effectiveness of this process using the dissociation process of a BH$_3$NH$_3$ molecule, an alkane chain and a supercell with 256 water molecules. Numerical results for the large collection of water molecules show that two stage localization procedure can be more than 30 times faster than the original SCDM algorithm and compares favorably with the popular Wannier90 package.