Quanhui Zhu

h-index3
2papers

2 Papers

8.3NAMar 14
Energy Dissipation Preserving Feature-based DNN Galerkin Methods for Gradient Flows

Tao Tang, Jiang Yang, Yuxiang Zhao et al.

In recent years, deep learning methods, exemplified by Physics-Informed Neural Networks (PINNs), have been widely applied to the numerical solution of differential equations. However, these methods may suffer from limited accuracy, high training costs, and lack of robustness, particularly their inability to preserve the intrinsic physical structures of continuous PDE models, such as the energy dissipation property in gradient flow systems. To address these challenges, we propose a feature-based Deep Neural Network Galerkin (DNN-G) framework designed for structure-preserving simulations of gradient flows. Instead of treating neural networks merely as optimization-driven solvers, we employ them as adaptive feature generators that define nonlinear trial spaces within a Galerkin projection formulation.This formulation guarantees semi-discrete energy dissipation and can be naturally combined with energy stable time integration schemes. Several strategies for constructing neural basis functions are investigated, including random features, structured initialization, and problem-informed pre-training. Numerical experiments demonstrate that the proposed method preserves robust energy stability in high-dimensional settings and accurately captures complex topological transitions. With equivalent degrees of freedom, the DNN-G framework achieves higher accuracy than classical spectral methods, highlighting the effectiveness of neural feature representations for the numerical solution of partial differential equations.

LGDec 6, 2024
$ε$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics

Jiang Yang, Yuxiang Zhao, Quanhui Zhu

Understanding the training dynamics of deep neural networks (DNNs), particularly how they evolve low-dimensional features from high-dimensional data, remains a central challenge in deep learning theory. In this work, we introduce the concept of $ε$-rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer. Through extensive experiments across diverse tasks, we observe a universal staircase phenomenon: during training process implemented by the standard stochastic gradient descent methods, the decline of the loss function is accompanied by an increase in the $ε$-rank and exhibits a staircase pattern. Theoretically, we rigorously prove a negative correlation between the loss lower bound and $ε$-rank, demonstrating that a high $ε$-rank is essential for significant loss reduction. Moreover, numerical evidences show that within the same deep neural network, the $ε$-rank of the subsequent hidden layer is higher than that of the previous hidden layer. Based on these observations, to eliminate the staircase phenomenon, we propose a novel pre-training strategy on the initial hidden layer that elevates the $ε$-rank of the terminal hidden layer. Numerical experiments validate its effectiveness in reducing training time and improving accuracy across various tasks. Therefore, the newly introduced concept of $ε$-rank is a computable quantity that serves as an intrinsic effective metric characteristic for deep neural networks, providing a novel perspective for understanding the training dynamics of neural networks and offering a theoretical foundation for designing efficient training strategies in practical applications.