DCJun 11

High-Order Spectral Element Methods for Wave Propagation on ARM Multicore CPU with SME: Optimizations and Implications

Yinuo Wang, Lin Gan, Tianqi Mao, Wubing Wan, Zekun Yin, Wenqiang Wang, Wei Xue, Guangwen Yang

arXiv:2606.12850v18.2

Predicted impact top 36% in DC · last 90 daysOriginality Incremental advance

AI Analysis

For HPC practitioners using SEM on emerging ARM architectures, this work demonstrates how SME can improve both kernel efficiency and discretization choices.

The paper optimizes the spectral element method (SEM) for wave propagation on ARM multicore CPUs with Scalable Matrix Extension (SME), achieving 4-6x speedup over the original code and showing that SME shifts the optimal polynomial order for performance-accuracy tradeoffs.

Wave propagation based on the spectral element method (SEM) is a representative HPC workload, but existing SEM implementations are not well matched to emerging ARM multicore CPUs with Scalable Matrix Extension (SME). We present an SME-enabled optimization of \textsc{SPECFEM3D} on the emerging LX2 processor that combines an SME-aware batched small-matrix kernel for SEM tensor-product operators, a memory-aware hybrid MPI+OpenMP execution scheme for limited-HBM systems, and a dispersion-based iso-accuracy study of the $(h,p)$ tradeoff. At fixed polynomial order, the optimized implementation improves full-application performance by 4--6$\times$ over the original code and delivers clear gains over optimized non-SME CPU baselines. Beyond these implementation-level gains, our results suggest that SME shifts the performance-favorable operating point toward higher polynomial orders along the dispersion-based iso-accuracy frontier, further reducing time-to-solution and working-set size. These results indicate that SME affects not only kernel efficiency, but also the practical discretization tradeoff for SEM on modern ARM multicore platforms.

View on arXiv PDF

Similar