91.5NAMay 15
A Parallel and Adaptive Mesh-Free method for Heterogeneous Porous MediaKapil Chawla, Sanghyun Lee, Yeonjong Shin
Material properties such as permeability fields in heterogeneous porous media are often represented as discontinuous, piecewise constant data tied to a given spatial discretization. Such representations are inherently mesh-dependent, requiring interpolation or projection whenever they are transferred to a different discretization. In this work, we develop \emph{Parallel and Adaptive Mesh-Free Approximation (PAM)}, a mesh-independent framework that approximates discontinuous data by a continuous, closed-form function. The resulting approximation can be evaluated consistently across different geometries and numerical discretizations, while preserving sharp interface features. The proposed PAM framework employs radial basis functions (RBFs) to construct continuous approximations of discontinuous data. To accurately capture discontinuities, we incorporate Shepard-normalization, which stabilizes the approximation near sharp interfaces. The coefficients of the RBF expansion are determined via sparse regression, enabling automatic selection of the most relevant basis functions and promoting robust representations. In addition, we develop a novel adaptive refinement approach which further enriches the approximation in regions of rapid spatial variation. We provide a theoretical analysis showing that the proposed normalized RBF framework achieves arbitrarily small $L^1$ error in approximating discontinuous step functions. To enhance computational efficiency, the domain is partitioned into subdomains, and the reconstruction problem is solved independently on each subdomain in parallel. Numerical experiments demonstrate the accuracy, adaptivity, and scalability of the proposed method, including applications to challenging heterogeneous permeability fields.
OCOct 22, 2023
Randomized Forward Mode of Automatic Differentiation For Optimization AlgorithmsKhemraj Shukla, Yeonjong Shin
We present a randomized forward mode gradient (RFG) as an alternative to backpropagation. RFG is a random estimator for the gradient that is constructed based on the directional derivative along a random vector. The forward mode automatic differentiation (AD) provides an efficient computation of RFG. The probability distribution of the random vector determines the statistical properties of RFG. Through the second moment analysis, we found that the distribution with the smallest kurtosis yields the smallest expected relative squared error. By replacing gradient with RFG, a class of RFG-based optimization algorithms is obtained. By focusing on gradient descent (GD) and Polyak's heavy ball (PHB) methods, we present a convergence analysis of RFG-based optimization algorithms for quadratic functions. Computational experiments are presented to demonstrate the performance of the proposed algorithms and verify the theoretical findings.
36.8LGApr 3
WGFINNs: Weak formulation-based GENERIC formalism informed neural networks'Jun Sur Richard Park, Auroni Huque Hashim, Siu Wun Cheung et al.
Data-driven discovery of governing equations from noisy observations remains a fundamental challenge in scientific machine learning. While GENERIC formalism informed neural networks (GFINNs) provide a principled framework that enforces the laws of thermodynamics by construction, their reliance on strong-form loss formulations makes them highly sensitive to measurement noise. To address this limitation, we propose weak formulation-based GENERIC formalism informed neural networks (WGFINNs), which integrate the weak formulation of dynamical systems with the structure-preserving architecture of GFINNs. WGFINNs significantly enhance robustness to noisy data while retaining exact satisfaction of GENERIC degeneracy and symmetry conditions. We further incorporate a state-wise weighted loss and a residual-based attention mechanism to mitigate scale imbalance across state variables. Theoretical analysis contrasts quantitative differences between the strong-form and the weak-form estimators. Mainly, the strong-form estimator diverges as the time step decreases in the presence of noise, while the weak-form estimator can be accurate even with noisy data if test functions satisfy certain conditions. Numerical experiments demonstrate that WGFINNs consistently outperform GFINNs at varying noise levels, achieving more accurate predictions and reliable recovery of physical quantities.
CEMar 16, 2024
A Comprehensive Review of Latent Space Dynamics Identification Algorithms for Intrusive and Non-Intrusive Reduced-Order-ModelingChristophe Bonneville, Xiaolong He, April Tran et al.
Numerical solvers of partial differential equations (PDEs) have been widely employed for simulating physical systems. However, the computational cost remains a major bottleneck in various scientific and engineering applications, which has motivated the development of reduced-order models (ROMs). Recently, machine-learning-based ROMs have gained significant popularity and are promising for addressing some limitations of traditional ROM methods, especially for advection dominated systems. In this chapter, we focus on a particular framework known as Latent Space Dynamics Identification (LaSDI), which transforms the high-fidelity data, governed by a PDE, to simpler and low-dimensional latent-space data, governed by ordinary differential equations (ODEs). These ODEs can be learned and subsequently interpolated to make ROM predictions. Each building block of LaSDI can be easily modulated depending on the application, which makes the LaSDI framework highly flexible. In particular, we present strategies to enforce the laws of thermodynamics into LaSDI models (tLaSDI), enhance robustness in the presence of noise through the weak form (WLaSDI), select high-fidelity training data efficiently through active learning (gLaSDI, GPLaSDI), and quantify the ROM prediction uncertainty through Gaussian processes (GPLaSDI). We demonstrate the performance of different LaSDI approaches on Burgers equation, a non-linear heat conduction problem, and a plasma physics problem, showing that LaSDI algorithms can achieve relative errors of less than a few percent and up to thousands of times speed-ups.
LGMar 9, 2024
tLaSDI: Thermodynamics-informed latent space dynamics identificationJun Sur Richard Park, Siu Wun Cheung, Youngsoo Choi et al.
We propose a latent space dynamics identification method, namely tLaSDI, that embeds the first and second principles of thermodynamics. The latent variables are learned through an autoencoder as a nonlinear dimension reduction model. The latent dynamics are constructed by a neural network-based model that precisely preserves certain structures for the thermodynamic laws through the GENERIC formalism. An abstract error estimate is established, which provides a new loss formulation involving the Jacobian computation of autoencoder. The autoencoder and the latent dynamics are simultaneously trained to minimize the new loss. Computational examples demonstrate the effectiveness of tLaSDI, which exhibits robust generalization ability, even in extrapolation. In addition, an intriguing correlation is empirically observed between a quantity from tLaSDI in the latent space and the behaviors of the full-state solution.
LGJun 10, 2025
Thermodynamically Consistent Latent Dynamics Identification for Parametric SystemsXiaolong He, Yeonjong Shin, Anthony Gruber et al.
We propose an efficient thermodynamics-informed latent space dynamics identification (tLaSDI) framework for the reduced-order modeling of parametric nonlinear dynamical systems. This framework integrates autoencoders for dimensionality reduction with newly developed parametric GENERIC formalism-informed neural networks (pGFINNs), which enable efficient learning of parametric latent dynamics while preserving key thermodynamic principles such as free energy conservation and entropy generation across the parameter space. To further enhance model performance, a physics-informed active learning strategy is incorporated, leveraging a greedy, residual-based error indicator to adaptively sample informative training data, outperforming uniform sampling at equivalent computational cost. Numerical experiments on the Burgers' equation and the 1D/1V Vlasov-Poisson equation demonstrate that the proposed method achieves up to 3,528x speed-up with 1-3% relative errors, and significant reduction in training (50-90%) and inference (57-61%) cost. Moreover, the learned latent space dynamics reveal the underlying thermodynamic behavior of the system, offering valuable insights into the physical-space dynamics.
LGSep 15, 2025
Deep operator network for surrogate modeling of poroelasticity with random permeability fieldsSangjoon Park, Yeonjong Shin, Jinhyun Choo
Poroelasticity -- coupled fluid flow and elastic deformation in porous media -- often involves spatially variable permeability, especially in subsurface systems. In such cases, simulations with random permeability fields are widely used for probabilistic analysis, uncertainty quantification, and inverse problems. These simulations require repeated forward solves that are often prohibitively expensive, motivating the development of efficient surrogate models. However, efficient surrogate modeling techniques for poroelasticity with random permeability fields remain scarce. In this study, we propose a surrogate modeling framework based on the deep operator network (DeepONet), a neural architecture designed to learn mappings between infinite-dimensional function spaces. The proposed surrogate model approximates the solution operator that maps random permeability fields to transient poroelastic responses. To enhance predictive accuracy and stability, we integrate three strategies: nondimensionalization of the governing equations, input dimensionality reduction via Karhunen--Loéve expansion, and a two-step training procedure that decouples the optimization of branch and trunk networks. The methodology is evaluated on two benchmark problems in poroelasticity: soil consolidation and ground subsidence induced by groundwater extraction. In both cases, the DeepONet achieves substantial speedup in inference while maintaining high predictive accuracy across a wide range of permeability statistics. These results highlight the potential of the proposed approach as a scalable and efficient surrogate modeling technique for poroelastic systems with random permeability fields.
NASep 2, 2023
On the training and generalization of deep operator networksSanghyun Lee, Yeonjong Shin
We present a novel training method for deep operator networks (DeepONets), one of the most popular neural network models for operators. DeepONets are constructed by two sub-networks, namely the branch and trunk networks. Typically, the two sub-networks are trained simultaneously, which amounts to solving a complex optimization problem in a high dimensional space. In addition, the nonconvex and nonlinear nature makes training very challenging. To tackle such a challenge, we propose a two-step training method that trains the trunk network first and then sequentially trains the branch network. The core mechanism is motivated by the divide-and-conquer paradigm and is the decomposition of the entire complex training task into two subtasks with reduced complexity. Therein the Gram-Schmidt orthonormalization process is introduced which significantly improves stability and generalization ability. On the theoretical side, we establish a generalization error estimate in terms of the number of training data, the width of DeepONets, and the number of input and output sensors. Numerical examples are presented to demonstrate the effectiveness of the two-step training method, including Darcy flow in heterogeneous porous media.
DSAug 31, 2021
GFINNs: GENERIC Formalism Informed Neural Networks for Deterministic and Stochastic Dynamical SystemsZhen Zhang, Yeonjong Shin, George Em Karniadakis
We propose the GENERIC formalism informed neural networks (GFINNs) that obey the symmetric degeneracy conditions of the GENERIC formalism. GFINNs comprise two modules, each of which contains two components. We model each component using a neural network whose architecture is designed to satisfy the required conditions. The component-wise architecture design provides flexible ways of leveraging available physics information into neural networks. We prove theoretically that GFINNs are sufficiently expressive to learn the underlying equations, hence establishing the universal approximation theorem. We demonstrate the performance of GFINNs in three simulation problems: gas containers exchanging heat and volume, thermoelastic double pendulum and the Langevin dynamics. In all the examples, GFINNs outperform existing methods, hence demonstrating good accuracy in predictions for both deterministic and stochastic systems.
LGMay 20, 2021
Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functionsAmeya D. Jagtap, Yeonjong Shin, Kenji Kawaguchi et al.
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of constructing a very wide network while keeping the number of parameters low. Our theoretical analysis reveals that under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks. This is also empirically verified through a set of computational examples. Furthermore, under certain technical assumptions, we establish global convergence of gradient descent for KNNs. As a specific case, we propose the Rowdy activation function that is designed to get rid of any saturation region by injecting sinusoidal fluctuations, which include trainable parameters. The proposed Rowdy activation function can be employed in any neural network architecture like feed-forward neural networks, Recurrent neural networks, Convolutional neural networks etc. The effectiveness of KNNs with Rowdy activation is demonstrated through various computational experiments including function approximation using feed-forward neural networks, solution inference of partial differential equations using the physics-informed neural networks, and standard deep learning benchmark problems using convolutional and fully-connected neural networks.
OCApr 6, 2021
A Caputo fractional derivative-based algorithm for optimizationYeonjong Shin, Jérôme Darbon, George Em Karniadakis
We propose a novel Caputo fractional derivative-based optimization algorithm. Upon defining the Caputo fractional gradient with respect to the Cartesian coordinate, we present a generic Caputo fractional gradient descent (CFGD) method. We prove that the CFGD yields the steepest descent direction of a locally smoothed objective function. The generic CFGD requires three parameters to be specified, and a choice of the parameters yields a version of CFGD. We propose three versions -- non-adaptive, adaptive terminal and adaptive order. By focusing on quadratic objective functions, we provide a convergence analysis. We prove that the non-adaptive CFGD converges to a Tikhonov regularized solution. For the two adaptive versions, we derive error bounds, which show convergence to integer-order stationary point under some conditions. We derive an explicit formula of CFGD for quadratic functions. We computationally found that the adaptive terminal (AT) CFGD mitigates the dependence on the condition number in the rate of convergence and results in significant acceleration over gradient descent (GD). For non-quadratic functions, we develop an efficient implementation of CFGD using the Gauss-Jacobi quadrature, whose computational cost is approximately proportional to the number of the quadrature points and the cost of GD. Our numerical examples show that AT-CFGD results in acceleration over GD, even when a small number of the Gauss-Jacobi quadrature points (including a single point) is used.
LGJul 14, 2020
Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and AvoidanceMark Ainsworth, Yeonjong Shin
The ability of neural networks to provide `best in class' approximation across a wide range of applications is well-documented. Nevertheless, the powerful expressivity of neural networks comes to naught if one is unable to effectively train (choose) the parameters defining the network. In general, neural networks are trained by gradient descent type optimization methods, or a stochastic variant thereof. In practice, such methods result in the loss function decreases rapidly at the beginning of training but then, after a relatively small number of steps, significantly slow down. The loss may even appear to stagnate over the period of a large number of epochs, only to then suddenly start to decrease fast again for no apparent reason. This so-called plateau phenomenon manifests itself in many learning tasks. The present work aims to identify and quantify the root causes of plateau phenomenon. No assumptions are made on the number of neurons relative to the number of training data, and our results hold for both the lazy and adaptive regimes. The main findings are: plateaux correspond to periods during which activation patterns remain constant, where activation pattern refers to the number of data points that activate a given neuron; quantification of convergence of the gradient flow dynamics; and, characterization of stationary points in terms solutions of local least squares regression lines over subsets of the training data. Based on these conclusions, we propose a new iterative training method, the Active Neuron Least Squares (ANLS), characterised by the explicit adjustment of the activation pattern at each step, which is designed to enable a quick exit from a plateau. Illustrative numerical examples are included throughout.
NAApr 3, 2020
On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEsYeonjong Shin, Jerome Darbon, George Em Karniadakis
Physics informed neural networks (PINNs) are deep learning based techniques for solving partial differential equations (PDEs) encounted in computational science and engineering. Guided by data and physical laws, PINNs find a neural network that approximates the solution to a system of PDEs. Such a neural network is obtained by minimizing a loss function in which any prior knowledge of PDEs and data are encoded. Despite its remarkable empirical success in one, two or three dimensional problems, there is little theoretical justification for PINNs. As the number of data grows, PINNs generate a sequence of minimizers which correspond to a sequence of neural networks. We want to answer the question: Does the sequence of minimizers converge to the solution to the PDE? We consider two classes of PDEs: linear second-order elliptic and parabolic. By adapting the Schauder approach and the maximum principle, we show that the sequence of minimizers strongly converges to the PDE solution in $C^0$. Furthermore, we show that if each minimizer satisfies the initial/boundary conditions, the convergence mode becomes $H^1$. Computational examples are provided to illustrate our theoretical findings. To the best of our knowledge, this is the first theoretical work that shows the consistency of PINNs.
LGOct 14, 2019
Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural NetworksYeonjong Shin
Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep neural networks is a challenging task. Many alternatives have been proposed in place of end-to-end back-propagation. Layer-wise training is one of them, which trains a single layer at a time, rather than trains the whole layers simultaneously. In this paper, we study a layer-wise training using a block coordinate gradient descent (BCGD) for deep linear networks. We establish a general convergence analysis of BCGD and found the optimal learning rate, which results in the fastest decrease in the loss. More importantly, the optimal learning rate can directly be applied in practice, as it does not require any prior knowledge. Thus, tuning the learning rate is not needed at all. Also, we identify the effects of depth, width, and initialization in the training process. We show that when the orthogonal-like initialization is employed, the width of intermediate layers plays no role in gradient-based training, as long as the width is greater than or equal to both the input and output dimensions. We show that under some conditions, the deeper the network is, the faster the convergence is guaranteed. This implies that in an extreme case, the global optimum is achieved after updating each weight matrix only once. Besides, we found that the use of deep networks could drastically accelerate convergence when it is compared to those of a depth 1 network, even when the computational cost is considered. Numerical examples are provided to justify our theoretical findings and demonstrate the performance of layer-wise training by BCGD.
LGJul 23, 2019
Trainability of ReLU networks and Data-dependent InitializationYeonjong Shin, George Em Karniadakis
In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.
MLMar 15, 2019
Dying ReLU and Initialization: Theory and Numerical ExamplesLu Lu, Yeonjong Shin, Yanhui Su et al.
The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.