30.2NAMar 29
Stability Analysis of Monolithic Globally Divergence-Free ALE-HDG Methods for Fluid-Structure InteractionShuaijun Liu, Xiaoping Xie
In this paper, we propose two monolithic fully discrete finite element methods for fluid-structure interaction (FSI) based on a novel Piola-type Arbitrary Lagrangian-Eulerian (ALE) mapping. For the temporal discretization, we apply the backward Euler method to both the non-conservative and conservative formulations. For the spatial discretization, we adopt arbitrary order hybridizable discontinuous Galerkin (HDG) methods for the incompressible Navier-Stokes and linear elasticity equations, and a continuous Galerkin (CG) method for the fluid mesh movement. We derive stability results for both the temporal semi-discretization and the fully discretization, and show that the velocity approximations of the fully discrete schemes are globally divergence-free. Several numerical experiments are performed to verify the performance of the proposed methods.
LGSep 1, 2025
GradES: Significantly Faster Training in Transformers with Gradient-Based Early StoppingQifu Wen, Xi Zeng, Zihan Zhou et al.
Early stopping monitors global validation loss and halts all parameter updates simultaneously, which is computationally costly for large transformers due to the extended time required for validation inference. We propose \textit{GradES}, a novel gradient-based early stopping approach that operates within transformer components (attention projections and Feed-Forward layer matrices). We found that different components converge at varying rates during fine-tuning for both language and vision-language models. \textit{GradES} tracks the magnitude of gradient changes in backpropagation for these matrices during training. When a projection matrix's magnitude of gradient changes fall below a convergence threshold $τ$, we exclude that projection matrix from further updates individually, eliminating costly validation passes while allowing slow converging matrices to continue learning. \textit{GradES} speeds up training time by 1.57--7.22$\times$ while simultaneously enhancing generalization through early prevention of overfitting, resulting in 1.2\% higher average accuracy in language tasks and 3.88\% on multimodal benchmarks.