Shuzhen Yang

h-index1
2papers

2 Papers

NAOct 10, 2013
The Numerical Properties of G-heat equation and Related Application

Xiaolin Gong, Shuzhen Yang

In this paper, we consider the numerical convergence of G-heat equation which was first introduced by Peng. The G-heat equation extends the classical heat equation with uncertain volatility. For G-heat equation is nonlinear partial differential equation(PDE), we prove that the Newton iteration is convergence and the fully implicit discretization is monotone and stable. Then, we have the fully implicit discretization convergence to the viscosity solution of a G-heat equation.

LGOct 17, 2025
Early-stopping for Transformer model training

Jing He, Hua Jiang, Cheng Li et al.

This work introduces a novel theoretical framework grounded in Random Matrix Theory (RMT) for analyzing Transformer training dynamics. We focus on the underlying mechanisms that drive performance improvements and derive principled early-stopping criteria. Empirically, we observe that the spectral density of the shallow self-attention matrix V consistently evolves into a heavy-tailed distribution. Utilizing the PL (Power Law) fit to this matrix as a probe, we demarcate training into three stages: structural exploration, heavy-tailed structure stabilization, and convergence saturation. This staging provides guidance for preliminary stopping decisions. Crucially, we propose two consistent and validation-free criteria: a quantitative metric for heavy-tailed dynamics and a novel spectral signature indicative of convergence. The strong alignment between these criteria highlights the utility of RMT for monitoring and diagnosing the progression of Transformer model training.