NAFeb 16, 2018
SBP-SAT finite difference discretization of acoustic wave equations on staggered block-wise uniform gridsLongfei Gao, David C. Del Rey Fernandez, Mark Carpenter et al.
We consider the numerical simulation of the acoustic wave equations arising from seismic applications, for which staggered grid finite difference methods are popular choices due to their simplicity and efficiency. We relax the uniform grid restriction on finite difference methods and allow the grids to be block-wise uniform with nonconforming interfaces. In doing so, variations in the wave speeds of the subterranean media can be accounted for more efficiently. Staggered grid finite difference operators satisfying the summation-by-parts (SBP) property are devised to approximate the spatial derivatives appearing in the acoustic wave equation. These operators are applied within each block independently. The coupling between blocks is achieved through simultaneous approximation terms (SATs), which impose the interface condition weakly, i.e., by penalty. Ratio of the grid spacing of neighboring blocks is allowed to be rational number, for which specially designed interpolation formulas are presented. These interpolation formulas constitute key pieces of the simultaneous approximation terms. The overall discretization is shown to be energy-conserving and examined on test cases of both theoretical and practical interests, delivering accurate and stable simulation results.
NAFeb 6, 2016
Fast Multipole Method as a Matrix-Free Hierarchical Low-Rank ApproximationRio Yokota, Huda Ibeid, David Keyes
There has been a large increase in the amount of work on hierarchical low-rank approximation methods, where the interest is shared by multiple communities that previously did not intersect. This objective of this article is two-fold; to provide a thorough review of the recent advancements in this field from both analytical and algebraic perspectives, and to present a comparative benchmark of two highly optimized implementations of contrasting methods for some simple yet representative test cases. We categorize the recent advances in this field from the perspective of compute-memory tradeoff, which has not been considered in much detail in this area. Benchmark tests reveal that there is a large difference in the memory consumption and performance between the different methods.
NAFeb 22, 2018
Combining finite element and finite difference methods for isotropic elastic wave simulations in an energy-conserving mannerLongfei Gao, David Keyes
We consider numerical simulation of the isotropic elastic wave equations arising from seismic applications with non-trivial land topography. The more flexible finite element method is applied to the shallow region of the simulation domain to account for the topography, and combined with the more efficient finite difference method that is applied to the deep region of the simulation domain. We demonstrate that these two discretization methods, albeit starting from different formulations of the elastic wave equation, can be joined together smoothly via weakly imposed interface conditions. Discrete energy analysis is employed to derive the proper interface treatment, leading to an overall discretization that is energy-conserving. Numerical examples are presented to demonstrate the efficacy of the proposed interface treatment.
NAJul 3, 2018
Tucker Tensor analysis of Matern functions in spatial statisticsAlexander Litvinenko, David Keyes, Venera Khoromskaia et al.
In this work, we describe advanced numerical tools for working with multivariate functions and for the analysis of large data sets. These tools will drastically reduce the required computing time and the storage cost, and, therefore, will allow us to consider much larger data sets or finer meshes. Covariance matrices are crucial in spatio-temporal statistical tasks, but are often very expensive to compute and store, especially in 3D. Therefore, we approximate covariance functions by cheap surrogates in a low-rank tensor format. We apply the Tucker and canonical tensor decompositions to a family of Matern- and Slater-type functions with varying parameters and demonstrate numerically that their approximations exhibit exponentially fast convergence. We prove the exponential convergence of the Tucker and canonical approximations in tensor rank parameters. Several statistical operations are performed in this low-rank tensor format, including evaluating the conditional covariance matrix, spatially averaged estimation variance, computing a quadratic form, determinant, trace, loglikelihood, inverse, and Cholesky decomposition of a large covariance matrix. Low-rank tensor approximations reduce the computing and storage costs essentially. For example, the storage cost is reduced from an exponential $\mathcal{O}(n^d)$ to a linear scaling $\mathcal{O}(drn)$, where $d$ is the spatial dimension, $n$ is the number of mesh points in one direction, and $r$ is the tensor rank. Prerequisites for applicability of the proposed techniques are the assumptions that the data, locations, and measurements lie on a tensor (axes-parallel) grid and that the covariance function depends on a distance, $\Vert x-y \Vert$.
NAJan 19, 2016
Fast Multipole Preconditioners for Sparse Matrices Arising from Elliptic EquationsHuda Ibeid, Rio Yokota, Jennifer Pestana et al.
Among optimal hierarchical algorithms for the computational solution of elliptic problems, the Fast Multipole Method (FMM) stands out for its adaptability to emerging architectures, having high arithmetic intensity, tunable accuracy, and relaxable global synchronization requirements. We demonstrate that, beyond its traditional use as a solver in problems for which explicit free-space kernel representations are available, the FMM has applicability as a preconditioner in finite domain elliptic boundary value problems, by equipping it with boundary integral capability for satisfying conditions at finite boundaries and by wrapping it in a Krylov method for extensibility to more general operators. Here, we do not discuss the well developed applications of FMM to implement matrix-vector multiplications within Krylov solvers of boundary element methods. Instead, we propose using FMM for the volume-to-volume contribution of inhomogeneous Poisson-like problems, where the boundary integral is a small part of the overall computation. Our method may be used to precondition sparse matrices arising from finite difference/element discretizations, and can handle a broader range of scientific applications. Compared with multigrid methods, it is capable of comparable algebraic convergence rates down to the truncation error of the discretized PDE, and it offers potentially superior multicore and distributed memory scalability properties on commodity architecture supercomputers. Compared with other methods exploiting the low rank character of off-diagonal blocks of the dense resolvent operator, FMM-preconditioned Krylov iteration may reduce the amount of communication because it is matrix-free and exploits the tree structure of FMM. We describe our tests in reproducible detail with freely available codes and outline directions for further extensibility.
APJan 4, 2018
Accelerated Cyclic Reduction: A Distributed-Memory Fast Solver for Structured Linear SystemsGustavo Chávez, George Turkiyyah, Stefano Zampini et al.
We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast direct solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has $O(k~N \log N~(\log N + k^2))$ arithmetic complexity and $O(k~N \log N)$ memory footprint, where $N$ is the number of degrees of freedom and $k$ is the rank of a typical off-diagonal block, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method where hierarchical semi-separable matrices are used for compressing the fronts, and with algebraic multigrid. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and memory footprint. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides.
NADec 24, 2017
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficientsGustavo Chávez, George Turkiyyah, Stefano Zampini et al.
We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.
NAApr 3, 2016
A Direct Elliptic Solver Based on Hierarchically Low-rank Schur ComplementsGustavo Chávez, George Turkiyyah, David Keyes
A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with $O(N \log^2 N)$ arithmetic complexity and $O(N \log N)$ memory footprint. We provide a baseline for performance and applicability by comparing with well known implementations of the $\mathcal{H}$-LU factorization and algebraic multigrid with a parallel implementation that leverages the concurrency features of the method. Numerical experiments reveal that this method is comparable with other fast direct solvers based on Hierarchical Matrices such as $\mathcal{H}$-LU and that it can tackle problems where algebraic multigrid fails to converge.
LGMar 18, 2024
PETScML: Second-order solvers for training regression problems in Scientific Machine LearningStefano Zampini, Umberto Zerbinati, George Turkiyyah et al.
In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods. However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization. We introduce a lightweight software framework built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional solvers for unconstrained minimization. We empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional second-order solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models.
CVJun 11, 2025
Synthetic Geology: Structural Geology Meets Deep LearningSimon Ghyselincks, Valeriia Okhmak, Stefano Zampini et al.
Reconstructing the structural geology and mineral composition of the first few kilometers of the Earth's subsurface from sparse or indirect surface observations remains a long-standing challenge with critical applications in mineral exploration, geohazard assessment, and geotechnical engineering. This inherently ill-posed problem is often addressed by classical geophysical inversion methods, which typically yield a single maximum-likelihood model that fails to capture the full range of plausible geology. The adoption of modern deep learning methods has been limited by the lack of large 3D training datasets. We address this gap with \textit{StructuralGeo}, a geological simulation engine that mimics eons of tectonic, magmatic, and sedimentary processes to generate a virtually limitless supply of realistic synthetic 3D lithological models. Using this dataset, we train both unconditional and conditional generative flow-matching models with a 3D attention U-net architecture. The resulting foundation model can reconstruct multiple plausible 3D scenarios from surface topography and sparse borehole data, depicting structures such as layers, faults, folds, and dikes. By sampling many reconstructions from the same observations, we introduce a probabilistic framework for estimating the size and extent of subsurface features. While the realism of the output is bounded by the fidelity of the training data to true geology, this combination of simulation and generative AI functions offers a flexible prior for probabilistic modeling, regional fine-tuning, and use as an AI-based regularizer in traditional geophysical inversion workflows.
LGJul 29, 2024
Constructing artificial life and materials scientists with accelerated AI using Deep AndersoNNSaleem Abdul Fattah Ahmed Al Dajani, David Keyes
Deep AndersoNN accelerates AI by exploiting the continuum limit as the number of explicit layers in a neural network approaches infinity and can be taken as a single implicit layer, known as a deep equilibrium model. Solving for deep equilibrium model parameters reduces to a nonlinear fixed point iteration problem, enabling the use of vector-to-vector iterative solvers and windowing techniques, such as Anderson extrapolation, for accelerating convergence to the fixed point deep equilibrium. Here we show that Deep AndersoNN achieves up to an order of magnitude of speed-up in training and inference. The method is demonstrated on density functional theory results for industrial applications by constructing artificial life and materials `scientists' capable of classifying drugs as strongly or weakly polar, metal-organic frameworks by pore size, and crystalline materials as metals, semiconductors, and insulators, using graph images of node-neighbor representations transformed from atom-bond networks. Results exhibit accuracy up to 98\% and showcase synergy between Deep AndersoNN and machine learning capabilities of modern computing architectures, such as GPUs, for accelerated computational life and materials science by quickly identifying structure-property relationships. This paves the way for saving up to 90\% of compute required for AI, reducing its carbon footprint by up to 60 gigatons per year by 2030, and scaling above memory limits of explicit neural networks in life and materials science, and beyond.
NAMay 6, 2019
Propagation of Uncertainties in Density-Driven FlowAlexander Litvinenko, Dmitry Logashenko, Raul Tempone et al.
Accurate modeling of contamination in subsurface flow and water aquifers is crucial for agriculture and environmental protection. Here, we demonstrate a parallel method to quantify the propagation of the uncertainty in the dispersal of pollution in subsurface flow. Specifically, we consider the density-driven flow and estimate how uncertainty from permeability and porosity propagates to the solution. We take an Elder-like problem as a numerical benchmark and we use random fields to model the limited knowledge on the porosity and permeability. We construct a low-cost generalized polynomial chaos expansion (gPC) surrogate model, where the gPC coefficients are computed by projection on sparse and full tensor grids. We parallelize both the numerical solver for the deterministic problem based on the multigrid method, and the quadrature over the parametric space