LGAug 1, 2024

Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks

arXiv:2408.00573v46 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses convergence issues in PINNs for scientific computing, offering a theoretical improvement but likely incremental for practical applications.

The paper tackles the slow convergence of gradient descent for training over-parameterized physics-informed neural networks by showing that natural gradient descent achieves a learning rate independent of the Gram matrix, with quadratic convergence for smooth activations, as verified numerically.

In the context of over-parameterization, there is a line of work demonstrating that randomly initialized (stochastic) gradient descent (GD) converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. However, the learning rate of GD for training two-layer neural networks exhibits poor dependence on the sample size and the Gram matrix, leading to a slow training process. In this paper, we show that for training two-layer $\text{ReLU}^3$ Physics-Informed Neural Networks (PINNs), the learning rate can be improved from $\mathcal{O}(λ_0)$ to $\mathcal{O}(1/\|\bm{H}^{\infty}\|_2)$, implying that GD actually enjoys a faster convergence rate. Despite such improvements, the convergence rate is still tied to the least eigenvalue of the Gram matrix, leading to slow convergence. We then develop the positive definiteness of Gram matrices with general smooth activation functions and provide the convergence analysis of natural gradient descent (NGD) in training two-layer PINNs, demonstrating that the learning rate can be $\mathcal{O}(1)$ and at this rate, the convergence rate is independent of the Gram matrix. In particular, for smooth activation functions, the convergence rate of NGD is quadratic. Numerical experiments are conducted to verify our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes