NAJun 1
Learning Chaotic Dynamics through Second-Order Geometric SupervisionShinhoo Kang, Hai V. Nguyen, Tan Bui-Thanh
Learning chaotic dynamical systems from data requires more than short-term predictive accuracy: the learned model must preserve the attractor geometry and its invariant statistics. Trajectory (zero-order) and Jacobian (first-order) matching supervise the values and tangent structure of the vector field, but neither constrains how the field bends away from its tangent plane. A model can thus match values and tangents at the supervised states yet curve differently from the truth, remaining locally accurate while drifting toward spurious attractors and distorting long-time statistics. We show that enforcing second-order consistency mitigates these failures, but forming the full Hessian is prohibitive in high dimensions. We propose model-constrained randomized Jacobian matching, which compares the Jacobians of the true and learned vector fields at randomly perturbed inputs. A Taylor expansion shows that the expected randomized Jacobian loss decomposes into the nominal Jacobian mismatch plus a Hessian mismatch scaled by the noise variance, implicitly enforcing second-order consistency at $\mathcal{O}(d^2)$ cost without forming the $\mathcal{O}(d^3)$ Hessian tensor. Using only Jacobian evaluations, the method scales to high dimensions where explicit Hessian matching does not. Numerical experiments confirm that second-order methods are robust. For Lorenz~63, first-order methods produce catastrophic Lyapunov-exponent outliers under minimal temporal supervision, which second-order methods eliminate while recovering the correct attractor. For coupled Lorenz~96, an out-of-distribution forcing sweep separates the methods: all agree up to $F=16$, but beyond $F=18$ only second-order methods preserve the invariant measure and Lyapunov spectrum. On both systems, randomized Jacobian matching performs comparably to explicit Hessian matching at much lower cost.
NAJul 2, 2016
Accelerating MCMC with active subspacesPaul G. Constantine, Carson Kent, Tan Bui-Thanh
The Markov chain Monte Carlo (MCMC) method is the computational workhorse for Bayesian inverse problems. However, MCMC struggles in high-dimensional parameter spaces, since its iterates must sequentially explore the high-dimensional space. This struggle is compounded in physical applications when the nonlinear forward model is computationally expensive. One approach to accelerate MCMC is to reduce the dimension of the state space. Active subspaces are part of an emerging set of tools for subspace-based dimension reduction. An active subspace in a given inverse problem indicates a separation between a low-dimensional subspace that is informed by the data and its orthogonal complement that is constrained by the prior. With this information, one can run the sequential MCMC on the active variables while sampling independently according to the prior on the inactive variables. However, this approach to increase efficiency may introduce bias. We provide a bound on the Hellinger distance between the true posterior and its active subspace- exploiting approximation. And we demonstrate the active subspace-accelerated MCMC on two computational examples: (i) a two-dimensional parameter space with a quadratic forward model and one-dimensional active subspace and (ii) a 100-dimensional parameter space with a PDE-based forward model and a two-dimensional active subspace.
NAFeb 5, 2019
Scalable matrix-free adaptive product-convolution approximation for locally translation-invariant operatorsNick Alger, Vishwas Rao, Aaron Myers et al.
We present an adaptive grid matrix-free operator approximation scheme based on a "product-convolution" interpolation of convolution operators. This scheme is appropriate for operators that are locally translation-invariant, even if these operators are high-rank or full-rank. Such operators arise in Schur complement methods for solving partial differential equations (PDEs), as Hessians in PDE-constrained optimization and inverse problems, as integral operators, as covariance operators, and as Dirichlet-to-Neumann maps. Constructing the approximation requires computing the impulse responses of the operator to point sources centered on nodes in an adaptively refined grid of sample points. A randomized a-posteriori error estimator drives the adaptivity. Once constructed, the approximation can be efficiently applied to vectors using the fast Fourier transform. The approximation can be efficiently converted to hierarchical matrix ($H$-matrix) format, then inverted or factorized using scalable $H$-matrix arithmetic. The quality of the approximation degrades gracefully as fewer sample points are used, allowing cheap lower quality approximations to be used as preconditioners. This yields an automated method to construct preconditioners for locally translation-invariant Schur complements. We directly address issues related to boundaries and prove that our scheme eliminates boundary artifacts. We test the scheme on a spatially varying blurring kernel, on the non-local component of an interface Schur complement for the Poisson operator, and on the data misfit Hessian for an advection dominated advection-diffusion inverse problem. Numerical results show that the scheme outperforms existing methods.
NAJan 12, 2019
Analysis of an HDG Method for Linearized Incompressible Resistive MHD EquationsJeonghun J. Lee, Stephen Shannon, Tan Bui-Thanh et al.
We present a hybridized discontinuous Galerkin (HDG) method for stationary linearized incompressible magnetohydrodynamics (MHD) equations. At the heart of the paper is the introduction of an HDG flux of the dual saddle-point form of the MHD equations that facilitates the hybridization of discontinuous Galerkin (DG) method. We carry out the $\textit{a priori}$ error estimates for the proposed HDG method on simplicial meshes in both two- and three-dimensions. The analysis provides optimal convergence for the fluid velocity and the magnetic variables, and quasi-optimal convergence for the remaining quantities. Numerical examples are presented to verify the theoretical findings.
CENov 7, 2017
IMEX HDG-DG: a coupled implicit hybridized discontinuous Galerkin (HDG) and explicit discontinuous Galerkin (DG) approach for shallow water systemsShinhoo Kang, Francis X. Giraldo, Tan Bui-Thanh
We propose IMEX HDG-DG schemes for planar and spherical shallow water systems. Of interest is subcritical flow, where the speed of the gravity wave is faster than that of nonlinear advection. In order to simulate these flows efficiently, we split the governing system into a stiff part describing the gravity wave and a non-stiff part associated with nonlinear advection. The former is discretized implicitly with the HDG method while an explicit Runge-Kutta DG discretization is employed for the latter. The proposed IMEX HDG-DG framework: 1) facilitates high-order solutions both in time and space; 2) avoids overly small time-step sizes; 3) requires only one linear system solve per time stage; 4) relative to DG generates smaller and sparser linear systems while promoting further parallelism. Numerical results of various test cases demonstrate that our methods are comparable to explicit Runge-Kutta DG schemes in terms of accuracy while allowing for much larger time step sizes.
CENov 18, 2018
A Hybridized Discontinuous Galerkin Method for A Linear Degenerate Elliptic Equation Arising from Two-Phase MixturesShinhoo Kang, Tan Bui-Thanh, Todd Arbogast
We develop a high-order hybridized discontinuous Galerkin (HDG) method for a linear degenerate elliptic equation arising from a two-phase mixture of mantle convection or glacier dynamics. We show that the proposed HDG method is well-posed by using an energy approach. We derive ${\it a priori}$ error estimates for the proposed HDG method on simplicial meshes in both two- and three-dimensions. The error analysis shows that the convergence rates are optimal for both the scaled pressure and the scaled velocity for non-degenerate problems and are sub-optimal by half order for degenerate ones. Several numerical results are presented to confirm the theoretical estimates. We also enhance the HDG solutions by post-processing. The superconvergence rates of $(k+2)$ and $(k+\frac{3}{2})$ are observed for both a non-degenerate case and a degenerate case away from the degeneracy. Degenerate problems with low regularity solutions are also studied, and numerical results show that high-order methods are beneficial in terms of accuracy.
NANov 24, 2018
Unified geometric multigrid algorithm for hybridized high-order finite element methodsTim Wildey, Sriramkrishnan Muralikrishnan, Tan Bui-Thanh
We consider a standard elliptic partial differential equation and propose a geometric multigrid algorithm based on Dirichlet-to-Neumann (DtN) maps for hybridized high-order finite element methods. The proposed unified approach is applicable to any locally conservative hybridized finite element method including multinumerics with different hybridized methods in different parts of the domain. For these methods, the linear system involves only the unknowns residing on the mesh skeleton, and constructing intergrid transfer operators is therefore not trivial. The key to our geometric multigrid algorithm is the physics-based energy-preserving intergrid transfer operators which depend only on the fine scale DtN maps. Thanks to these operators, we completely avoid upscaling of parameters and no information regarding subgrid physics is explicitly required on coarse meshes. Moreover, our algorithm is agglomeration-based and can straightforwardly handle unstructured meshes. We perform extensive numerical studies with hybridized mixed methods, hybridized discontinuous Galerkin method, weak Galerkin method, and a hybridized version of interior penalty discontinuous Galerkin methods on a range of elliptic problems including subsurface flow through highly heterogeneous porous media. We compare the performance of different smoothers and analyze the effect of stabilization parameters on the scalability of the multigrid algorithm.
NANov 14, 2017
An Improved Iterative HDG Approach for Partial Differential EquationsSriramkrishnan Muralikrishnan, Minh-Binh Tran, Tan Bui-Thanh
We propose and analyze an iterative high-order hybridized discontinuous Galerkin (iHDG) discretization for linear partial differential equations. We improve our previous work (SIAM J. Sci. Comput. Vol. 39, No. 5, pp. S782--S808) in several directions: 1) the improved iHDG approach converges in a finite number of iterations for the scalar transport equation; 2) it is unconditionally convergent for both the linearized shallow water system and the convection-diffusion equation; 3) it has improved stability and convergence rates; 4) we uncover a relationship between the number of iterations and time stepsize, solution order, meshsize and the equation parameters. This allows us to choose the time stepsize such that the number of iterations is approximately independent of the solution order and the meshsize; and 5) we provide both strong and weak scalings of the improved iHDG approach up to $16,384$ cores. A connection between iHDG and time integration methods such as parareal and implicit/explicit methods are discussed. Extensive numerical results are presented to verify the theoretical findings.
NAApr 17, 2017
A Data-Scalable Randomized Misfit Approach for Solving Large-Scale PDE-Constrained Inverse ProblemsEllen B. Le, Aaron Myers, Tan Bui-Thanh et al.
A randomized misfit approach is presented for the efficient solution of large-scale PDE-constrained inverse problems with high-dimensional data. The purpose of this paper is to offer a theory-based framework for random projections in this inverse problem setting. The stochastic approximation to the misfit is analyzed using random projection theory. By expanding beyond mean estimator convergence, a practical characterization of randomized misfit convergence can be achieved. The theoretical results developed hold with any valid random projection in the literature. The class of feasible distributions is broad yet simple to characterize compared to previous stochastic misfit methods. This class includes very sparse random projections which provide additional computational benefit. A different proof for a variant of the Johnson-Lindenstrauss lemma is also provided. This leads to a different intuition for the $O(ε^{-2})$ factor in bounds for Johnson-Lindenstrauss results. The main contribution of this paper is a theoretical result showing the method guarantees a valid solution for small reduced misfit dimensions. The interplay between Johnson-Lindenstrauss theory and Morozov's discrepancy principle is shown to be essential to the result. The computational cost savings for large-scale PDE-constrained problems with high- dimensional data is discussed. Numerical verification of the developed theory is presented for model problems of estimating a distributed parameter in an elliptic partial differential equation. Results with different random projections are presented to demonstrate the viability and accuracy of the proposed approach.
LGAug 9, 2022
A Model-Constrained Tangent Slope Learning Approach for Dynamical SystemsHai V. Nguyen, Tan Bui-Thanh
Real-time accurate solutions of large-scale complex dynamical systems are in critical need for control, optimization, uncertainty quantification, and decision-making in practical engineering and science applications, especially digital twin applications. This paper contributes in this direction a model-constrained tangent slope learning (mcTangent) approach. At the heart of mcTangent is the synergy of several desirable strategies: i) a tangent slope learning to take advantage of the neural network speed and the time-accurate nature of the method of lines; ii) a model-constrained approach to encode the neural network tangent slope with the underlying governing equations; iii) sequential learning strategies to promote long-time stability and accuracy; and iv) data randomization approach to implicitly enforce the smoothness of the neural network tangent slope and its likeliness to the truth tangent slope up second order derivatives in order to further enhance the stability and accuracy of mcTangent solutions. Rigorous results are provided to analyze and justify the proposed approach. Several numerical results for the transport equation, viscous Burgers equation, and Navier-Stokes equation are presented to study and demonstrate the robustness and long-time accuracy of the proposed mcTangent learning approach.
MLSep 27, 2024
A Model-Constrained Discontinuous Galerkin Network (DGNet) for Compressible Euler Equations with Out-of-Distribution GeneralizationHai V. Nguyen, Jau-Uei Chen, Tan Bui-Thanh
Real-time accurate solutions of large-scale complex dynamical systems are critically needed for control, optimization, uncertainty quantification, and decision-making in practical engineering and science applications, particularly in digital twin contexts. In this work, we develop a model-constrained discontinuous Galerkin Network (DGNet) approach, a significant extension to our previous work [Model-constrained Tagent Slope Learning Approach for Dynamical Systems], for compressible Euler equations with out-of-distribution generalization. The core of DGNet is the synergy of several key strategies: (i) leveraging time integration schemes to capture temporal correlation and taking advantage of neural network speed for computation time reduction; (ii) employing a model-constrained approach to ensure the learned tangent slope satisfies governing equations; (iii) utilizing a GNN-inspired architecture where edges represent Riemann solver surrogate models and nodes represent volume integration correction surrogate models, enabling capturing discontinuity capability, aliasing error reduction, and mesh discretization generalizability; (iv) implementing the input normalization technique that allows surrogate models to generalize across different initial conditions, geometries, meshes, boundary conditions, and solution orders; and (v) incorporating a data randomization technique that not only implicitly promotes agreement between surrogate models and true numerical models up to second-order derivatives, ensuring long-term stability and prediction capacity, but also serves as a data generation engine during training, leading to enhanced generalization on unseen data. To validate the effectiveness, stability, and generalizability of our novel DGNet approach, we present comprehensive numerical results for 1D and 2D compressible Euler equation problems.
NAApr 10, 2023
An autoencoder compression approach for accelerating large-scale inverse problemsJonathan Wittmer, Jacob Badger, Hari Sundar et al.
PDE-constrained inverse problems are some of the most challenging and computationally demanding problems in computational science today. Fine meshes that are required to accurately compute the PDE solution introduce an enormous number of parameters and require large scale computing resources such as more processors and more memory to solve such systems in a reasonable time. For inverse problems constrained by time dependent PDEs, the adjoint method that is often employed to efficiently compute gradients and higher order derivatives requires solving a time-reversed, so-called adjoint PDE that depends on the forward PDE solution at each timestep. This necessitates the storage of a high dimensional forward solution vector at every timestep. Such a procedure quickly exhausts the available memory resources. Several approaches that trade additional computation for reduced memory footprint have been proposed to mitigate the memory bottleneck, including checkpointing and compression strategies. In this work, we propose a close-to-ideal scalable compression approach using autoencoders to eliminate the need for checkpointing and substantial memory storage, thereby reducing both the time-to-solution and memory requirements. We compare our approach with checkpointing and an off-the-shelf compression approach on an earth-scale ill-posed seismic inverse problem. The results verify the expected close-to-ideal speedup for both the gradient and Hessian-vector product using the proposed autoencoder compression approach. To highlight the usefulness of the proposed approach, we combine the autoencoder compression with the data-informed active subspace (DIAS) prior to show how the DIAS method can be affordably extended to large scale problems without the need of checkpointing and large memory.
LGNov 13, 2022
An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network ArchitectureC G Krishnanunni, Tan Bui-Thanh
This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.
LGMar 23
Generalization Limits of In-Context Operator Networks for Higher-Order Partial Differential EquationsJamie Mahowald, Tan Bui-Thanh
We investigate the generalization capabilities of In-Context Operator Networks (ICONs), a new class of operator networks that build on the principles of in-context learning, for higher-order partial differential equations. We extend previous work by expanding the type and scope of differential equations handled by the foundation model. We demonstrate that while processing complex inputs requires some new computational methods, the underlying machine learning techniques are largely consistent with simpler cases. Our implementation shows that although point-wise accuracy degrades for higher-order problems like the heat equation, the model retains qualitative accuracy in capturing solution dynamics and overall behavior. This demonstrates the model's ability to extrapolate fundamental solution characteristics to problems outside its training regime.
AIFeb 26
The AI Research Assistant: Promise, Peril, and a Proof of ConceptTan Bui-Thanh
Can artificial intelligence truly contribute to creative mathematical research, or does it merely automate routine calculations while introducing risks of error? We provide empirical evidence through a detailed case study: the discovery of novel error representations and bounds for Hermite quadrature rules via systematic human-AI collaboration. Working with multiple AI assistants, we extended results beyond what manual work achieved, formulating and proving several theorems with AI assistance. The collaboration revealed both remarkable capabilities and critical limitations. AI excelled at algebraic manipulation, systematic proof exploration, literature synthesis, and LaTeX preparation. However, every step required rigorous human verification, mathematical intuition for problem formulation, and strategic direction. We document the complete research workflow with unusual transparency, revealing patterns in successful human-AI mathematical collaboration and identifying failure modes researchers must anticipate. Our experience suggests that, when used with appropriate skepticism and verification protocols, AI tools can meaningfully accelerate mathematical discovery while demanding careful human oversight and deep domain expertise.
LGFeb 8, 2025
Topological derivative approach for deep neural network architecture adaptationC G Krishnanunni, Tan Bui-Thanh, Clint Dawson
This work presents a novel algorithm for progressively adapting neural network architecture along the depth. In particular, we attempt to address the following questions in a mathematically principled way: i) Where to add a new capacity (layer) during the training process? ii) How to initialize the new capacity? At the heart of our approach are two key ingredients: i) the introduction of a ``shape functional" to be minimized, which depends on neural network topology, and ii) the introduction of a topological derivative of the shape functional with respect to the neural network topology. Using an optimal control viewpoint, we show that the network topological derivative exists under certain conditions, and its closed-form expression is derived. In particular, we explore, for the first time, the connection between the topological derivative from a topology optimization framework with the Hamiltonian from optimal control theory. Further, we show that the optimality condition for the shape functional leads to an eigenvalue problem for deep neural architecture adaptation. Our approach thus determines the most sensitive location along the depth where a new layer needs to be inserted during the training phase and the associated parametric initialization for the newly added layer. We also demonstrate that our layer insertion strategy can be derived from an optimal transport viewpoint as a solution to maximizing a topological derivative in $p$-Wasserstein space, where $p>= 1$. Numerical investigations with fully connected network, convolutional neural network, and vision transformer on various regression and classification problems demonstrate that our proposed approach can outperform an ad-hoc baseline network and other architecture adaptation strategies. Further, we also demonstrate other applications of topological derivative in fields such as transfer learning.
LGDec 9, 2024
TAEN: A Model-Constrained Tikhonov Autoencoder Network for Forward and Inverse ProblemsHai V. Nguyen, Tan Bui-Thanh, Clint Dawson
Efficient real-time solvers for forward and inverse problems are essential in engineering and science applications. Machine learning surrogate models have emerged as promising alternatives to traditional methods, offering substantially reduced computational time. Nevertheless, these models typically demand extensive training datasets to achieve robust generalization across diverse scenarios. While physics-based approaches can partially mitigate this data dependency and ensure physics-interpretable solutions, addressing scarce data regimes remains a challenge. Both purely data-driven and physics-based machine learning approaches demonstrate severe overfitting issues when trained with insufficient data. We propose a novel Tikhonov autoencoder model-constrained framework, called TAE, capable of learning both forward and inverse surrogate models using a single arbitrary observation sample. We develop comprehensive theoretical foundations including forward and inverse inference error bounds for the proposed approach for linear cases. For comparative analysis, we derive equivalent formulations for pure data-driven and model-constrained approach counterparts. At the heart of our approach is a data randomization strategy, which functions as a generative mechanism for exploring the training data space, enabling effective training of both forward and inverse surrogate models from a single observation, while regularizing the learning process. We validate our approach through extensive numerical experiments on two challenging inverse problems: 2D heat conductivity inversion and initial condition reconstruction for time-dependent 2D Navier-Stokes equations. Results demonstrate that TAE achieves accuracy comparable to traditional Tikhonov solvers and numerical forward solvers for both inverse and forward problems, respectively, while delivering orders of magnitude computational speedups.
MLJan 4
Variance-Reduced Diffusion Sampling via Conditional Score Expectation IdentityAlois Duston, Tan Bui-Thanh
We introduce and prove a \textbf{Conditional Score Expectation (CSE)} identity: an exact relation for the marginal score of affine diffusion processes that links scores across time via a conditional expectation under the forward dynamics. Motivated by this identity, we propose a CSE-based statistical estimator for the score using a Self-Normalized Importance Sampling (SNIS) procedure with prior samples and forward noise. We analyze its relationship to the standard Tweedie estimator, proving anti-correlation for Gaussian targets and establishing the same behavior for general targets in the small time-step regime. Exploiting this structure, we derive a variance-minimizing blended score estimator given by a state--time dependent convex combination of the CSE and Tweedie estimators. Numerical experiments show that this optimal-blending estimator reduces variance and improves sample quality for a fixed computational budget compared to either baseline. We further extend the framework to Bayesian inverse problems via likelihood-informed SNIS weights, and demonstrate improved reconstruction quality and sample diversity on high-dimensional image reconstruction tasks and PDE-governed inverse problems.
MLJan 14, 2025
LiLaN: A Linear Latent Network as the Solution Operator for Real-Time Solutions to Stiff Nonlinear Ordinary Differential EquationsWilliam Cole Nockolds, C. G. Krishnanunni, Tan Bui-Thanh et al.
Solving stiff ordinary differential equations (StODEs) requires sophisticated numerical solvers, which are often computationally expensive. In general, traditional explicit time integration schemes with restricted time step sizes are not suitable for StODEs, and one must resort to costly implicit methods. On the other hand, state-of-the-art machine learning based methods, such as Neural ODE, poorly handle the timescale separation of various elements of the solutions to StODEs, while still requiring expensive implicit/explicit integration at inference time. In this work, we propose a linear latent network (LiLaN) approach in which the dynamics in the latent space can be integrated analytically, and thus numerical integration is completely avoided. At the heart of LiLaN are the following key ideas: i) two encoder networks to encode the initial condition together with parameters of the ODE to the slope and the initial condition for the latent dynamics, respectively. Since the latent dynamics, by design, are linear, the solution can be evaluated analytically; ii) a neural network to map the physical time to latent times, one for each latent variable. Finally, iii) a decoder network to decode the latent solution to the physical solution at the corresponding physical time. We provide a universal approximation theorem for the proposed LiLaN approach, showing that it can approximate the solution of any stiff nonlinear system on a compact set to any degree of accuracy epsilon. We also show an interesting fact that the dimension of the latent dynamical system in LiLaN is independent of epsilon. Numerical results on the "Robertson Stiff Chemical Kinetics Model," "Plasma Collisional-Radiative Model," and "Allen-Cahn" and "Cahn-Hilliard" PDEs suggest that LiLaN outperformed state-of-the-art machine learning approaches for handling stiff ordinary and partial differential equations.
LGDec 30, 2021
A Unified and Constructive Framework for the Universality of Neural NetworksTan Bui-Thanh
One of the reasons why many neural networks are capable of replicating complicated tasks or functions is their universal property. Though the past few decades have seen tremendous advances in theories of neural networks, a single constructive framework for neural network universality remains unavailable. This paper is the first effort to provide a unified and constructive framework for the universality of a large class of activation functions including most of existing ones. At the heart of the framework is the concept of neural network approximate identity (nAI). The main result is: {\em any nAI activation function is universal}. It turns out that most of existing activation functions are nAI, and thus universal in the space of continuous functions on compacta. The framework induces {\bf several advantages} over the contemporary counterparts. First, it is constructive with elementary means from functional analysis, probability theory, and numerical analysis. Second, it is the first unified attempt that is valid for most of existing activation functions. Third, as a by product, the framework provides the first universality proof for some of the existing activation functions including Mish, SiLU, ELU, GELU, and etc. Fourth, it provides new proofs for most activation functions. Fifth, it discovers new activation functions with guaranteed universality property. Sixth, for a given activation and error tolerance, the framework provides precisely the architecture of the corresponding one-hidden neural network with predetermined number of neurons, and the values of weights/biases. Seventh, the framework allows us to abstractly present the first universal approximation with favorable non-asymptotic rate.
MLMay 25, 2021
TNet: A Model-Constrained Tikhonov Network Approach for Inverse ProblemsHai V. Nguyen, Tan Bui-Thanh
Deep Learning (DL), in particular deep neural networks (DNN), by default is purely data-driven and in general does not require physics. This is the strength of DL but also one of its key limitations when applied to science and engineering problems in which underlying physical properties and desired accuracy need to be achieved. DL methods in their original forms are not capable of respecting the underlying mathematical models or achieving desired accuracy even in big-data regimes. However, many data-driven science and engineering problems, such as inverse problems, typically have limited experimental or observational data, and DL would overfit the data in this case. Leveraging information encoded in the underlying mathematical models, we argue, not only compensates missing information in low data regimes but also provides opportunities to equip DL methods with the underlying physics, hence promoting better generalization. This paper develops a model-constrained deep learning approach and its variant TNet that are capable of learning information hidden in both the training data and the underlying mathematical models to solve inverse problems governed by partial differential equations. We provide the constructions and some theoretical results for the proposed approaches. We show that data randomization can enhance the smoothness of the networks and their generalizations. Comprehensive numerical results not only confirm the theoretical findings but also show that with even as little as 20 training data samples for 1D deconvolution, 50 for inverse 2D heat conductivity problem, 100 and 50 for inverse initial conditions for time-dependent 2D Burgers' equation and 2D Navier-Stokes equations, respectively. TNet solutions can be as accurate as Tikhonov solutions while being several orders of magnitude faster. This is possible owing to the model-constrained term, replications, and randomization.
COMP-PHDec 17, 2019
Accelerating PDE-constrained Inverse Solutions with Deep Learning and Reduced Order ModelsSheroze Sheriffdeen, Jean C. Ragusa, Jim E. Morel et al.
Inverse problems are pervasive mathematical methods in inferring knowledge from observational and experimental data by leveraging simulations and models. Unlike direct inference methods, inverse problem approaches typically require many forward model solves usually governed by Partial Differential Equations (PDEs). This a crucial bottleneck in determining the feasibility of such methods. While machine learning (ML) methods, such as deep neural networks (DNNs), can be employed to learn nonlinear forward models, designing a network architecture that preserves accuracy while generalizing to new parameter regimes is a daunting task. Furthermore, due to the computation-expensive nature of forward models, state-of-the-art black-box ML methods would require an unrealistic amount of work in order to obtain an accurate surrogate model. On the other hand, standard Reduced-Order Models (ROMs) accurately capture supposedly important physics of the forward model in the reduced subspaces, but otherwise could be inaccurate elsewhere. In this paper, we propose to enlarge the validity of ROMs and hence improve the accuracy outside the reduced subspaces by incorporating a data-driven ML technique. In particular, we focus on a goal-oriented approach that substantially improves the accuracy of reduced models by learning the error between the forward model and the ROM outputs. Once an ML-enhanced ROM is constructed it can accelerate the performance of solving many-query problems in parametrized forward and inverse problems. Numerical results for inverse problems governed by elliptic PDEs and parametrized neutron transport equations will be presented to support our approach.
MLDec 5, 2019
Solving Bayesian Inverse Problems via Variational AutoencodersHwan Goh, Sheroze Sheriffdeen, Jonathan Wittmer et al.
In recent years, the field of machine learning has made phenomenal progress in the pursuit of simulating real-world data generation processes. One notable example of such success is the variational autoencoder (VAE). In this work, with a small shift in perspective, we leverage and adapt VAEs for a different purpose: uncertainty quantification in scientific inverse problems. We introduce UQ-VAE: a flexible, adaptive, hybrid data/model-informed framework for training neural networks capable of rapid modelling of the posterior distribution representing the unknown parameter of interest. Specifically, from divergence-based variational inference, our framework is derived such that most of the information usually present in scientific inverse problems is fully utilized in the training procedure. Additionally, this framework includes an adjustable hyperparameter that allows selection of the notion of distance between the posterior model and the target distribution. This introduces more flexibility in controlling how optimization directs the learning of the posterior model. Further, this framework possesses an inherent adaptive optimization property that emerges through the learning of the posterior uncertainty.
NAAug 2, 2017
A data scalable augmented Lagrangian KKT preconditioner for large scale inverse problemsNick Alger, Umberto Villa, Tan Bui-Thanh et al.
Current state of the art preconditioners for the reduced Hessian and the Karush-Kuhn-Tucker (KKT) operator for large scale inverse problems are typically based on approximating the reduced Hessian with the regularization operator. However, the quality of this approximation degrades with increasingly informative observations or data. Thus the best case scenario from a scientific standpoint (fully informative data) is the worse case scenario from a computational perspective. In this paper we present an augmented Lagrangian-type preconditioner based on a block diagonal approximation of the augmented upper left block of the KKT operator. The preconditioner requires solvers for two linear subproblems that arise in the augmented KKT operator, which we expect to be much easier to precondition than the reduced Hessian. Analysis of the spectrum of the preconditioned KKT operator indicates that the preconditioner is effective when the regularization is chosen appropriately. In particular, it is effective when the regularization does not over-penalize highly informed parameter modes and does not under-penalize uninformed modes. Finally, we present a numerical study for a large data/low noise Poisson source inversion problem, demonstrating the effectiveness of the preconditioner. In this example, three MINRES iterations on the KKT system with our preconditioner results in a reconstruction with better accuracy than 50 iterations of CG on the reduced Hessian system with regularization preconditioning.
NAJun 1, 2017
iHDG: An Iterative HDG Framework for Partial Differential EquationsSriramkrishnan Muralikrishnan, Minh-Binh Tran, Tan Bui-Thanh
We present a scalable iterative solver for high-order hybridized discontinuous Galerkin (HDG) discretizations of linear partial differential equations. It is an interplay between domain decomposition methods and HDG discretizations, and hence inheriting advances from both sides. In particular, the method can be viewed as a Gauss-Seidel approach that requires only independent element-by-element and face-by-face local solves in each iteration. As such, it is well-suited for current and future computing systems with massive concurrencies. Unlike conventional Gauss-Seidel schemes which are purely algebraic, the convergence of iHDG, thanks to the built-in HDG numerical flux, does not depend on the ordering of unknowns. We rigorously show the convergence of the proposed method for the transport equation, the linearized shallow water equation and the convection-diffusion equation. For the transport equation, the method is convergent regardless of mesh size $h$ and solution order $p$, and furthermore the convergence rate is independent of the solution order. For the linearized shallow water and the convection-diffusion equations we show that the convergence is conditional on both $h$ and $p$. Extensive steady and time-dependent numerical results for the 2D and 3D transport equations, the linearized shallow water equation, and the convection-diffusion equation are presented to verify the theoretical findings.