NAFeb 5, 2019
Scalable matrix-free adaptive product-convolution approximation for locally translation-invariant operatorsNick Alger, Vishwas Rao, Aaron Myers et al.
We present an adaptive grid matrix-free operator approximation scheme based on a "product-convolution" interpolation of convolution operators. This scheme is appropriate for operators that are locally translation-invariant, even if these operators are high-rank or full-rank. Such operators arise in Schur complement methods for solving partial differential equations (PDEs), as Hessians in PDE-constrained optimization and inverse problems, as integral operators, as covariance operators, and as Dirichlet-to-Neumann maps. Constructing the approximation requires computing the impulse responses of the operator to point sources centered on nodes in an adaptively refined grid of sample points. A randomized a-posteriori error estimator drives the adaptivity. Once constructed, the approximation can be efficiently applied to vectors using the fast Fourier transform. The approximation can be efficiently converted to hierarchical matrix ($H$-matrix) format, then inverted or factorized using scalable $H$-matrix arithmetic. The quality of the approximation degrades gracefully as fewer sample points are used, allowing cheap lower quality approximations to be used as preconditioners. This yields an automated method to construct preconditioners for locally translation-invariant Schur complements. We directly address issues related to boundaries and prove that our scheme eliminates boundary artifacts. We test the scheme on a spatially varying blurring kernel, on the non-local component of an interface Schur complement for the Poisson operator, and on the data misfit Hessian for an advection dominated advection-diffusion inverse problem. Numerical results show that the scheme outperforms existing methods.
63.8NAMar 22
Tucker Tensor Train Taylor SeriesNick Alger, Blake Christierson, Peng Chen et al.
We present methods for constructing Taylor series surrogate models for covariance preconditioned high dimensional mappings that depend implicitly on the solution of a system of nonlinear equations, e.g., the solution of a partial differential equation. Taylor series are traditionally considered intractable for such mappings because the derivative tensors are enormous, and are only accessible through ``probing'' (contraction of the tensor with vectors in all but one index). We overcome these challenges using a ``Tucker tensor train Taylor series'' (T4S) surrogate model, in which each derivative tensor is approximated by a Tucker decomposition composed with a tensor train. After an initial dimension reduction, Tucker tensor trains are fit to directionally symmetric tensor probes using Riemannian manifold optimization within a rank continuation scheme. The optimization is enabled by fast sweeping methods for applying the Riemannian Jacobian (the Jacobian for the Tucker tensor train fitting problem) and its transpose to vectors. We justify the T4S model theoretically, and provide numerical evidence for the effectiveness of the proposed methods.
OCFeb 7, 2020
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex OptimizationThomas O'Leary-Roseberry, Nick Alger, Omar Ghattas
In modern deep learning, highly subsampled stochastic approximation (SA) methods are preferred to sample average approximation (SAA) methods because of large data sets as well as generalization properties. Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. In this work we motivate the extension of Newton methods to the SA regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation. Additionally, LRSFN can facilitate fast escape from indefinite regions leading to better optimization solutions. In the SA setting, iterative updates are dominated by stochastic noise, and stability of the method is key. We introduce a continuous time stability analysis framework, and use it to demonstrate that stochastic errors for Newton methods can be greatly amplified by ill-conditioned Hessians. The LRSFN method mitigates this stability issue via Levenberg-Marquardt damping. However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. Numerical results show that LRSFN can escape indefinite regions that other methods have issues with; and even under restrictive step length conditions, LRSFN can outperform popular first order methods on large scale deep learning tasks in terms of generalizability for equivalent computational work.
NAAug 2, 2017
A data scalable augmented Lagrangian KKT preconditioner for large scale inverse problemsNick Alger, Umberto Villa, Tan Bui-Thanh et al.
Current state of the art preconditioners for the reduced Hessian and the Karush-Kuhn-Tucker (KKT) operator for large scale inverse problems are typically based on approximating the reduced Hessian with the regularization operator. However, the quality of this approximation degrades with increasingly informative observations or data. Thus the best case scenario from a scientific standpoint (fully informative data) is the worse case scenario from a computational perspective. In this paper we present an augmented Lagrangian-type preconditioner based on a block diagonal approximation of the augmented upper left block of the KKT operator. The preconditioner requires solvers for two linear subproblems that arise in the augmented KKT operator, which we expect to be much easier to precondition than the reduced Hessian. Analysis of the spectrum of the preconditioned KKT operator indicates that the preconditioner is effective when the regularization is chosen appropriately. In particular, it is effective when the regularization does not over-penalize highly informed parameter modes and does not under-penalize uninformed modes. Finally, we present a numerical study for a large data/low noise Poisson source inversion problem, demonstrating the effectiveness of the preconditioner. In this example, three MINRES iterations on the KKT system with our preconditioner results in a reconstruction with better accuracy than 50 iterations of CG on the reduced Hessian system with regularization preconditioning.