LGFeb 2
Dissecting Outlier Dynamics in LLM NVFP4 PretrainingPeijie Dong, Ruibo Fan, Yuechen Tao et al.
Training large language models using 4-bit arithmetic enhances throughput and memory efficiency. Yet, the limited dynamic range of FP4 increases sensitivity to outliers. While NVFP4 mitigates quantization error via hierarchical microscaling, a persistent loss gap remains compared to BF16. This study conducts a longitudinal analysis of outlier dynamics across architecture during NVFP4 pretraining, focusing on where they localize, why they occur, and how they evolve temporally. We find that, compared with Softmax Attention (SA), Linear Attention (LA) reduces per-tensor heavy tails but still exhibits persistent block-level spikes under block quantization. Our analysis attributes outliers to specific architectural components: Softmax in SA, gating in LA, and SwiGLU in FFN, with "post-QK" operations exhibiting higher sensitivity to quantization. Notably, outliers evolve from transient spikes early in training to a small set of persistent hot channels (i.e., channels with persistently large magnitudes) in later stages. Based on these findings, we introduce Hot-Channel Patch (HCP), an online compensation mechanism that identifies hot channels and reinjects residuals using hardware-efficient kernels. We then develop CHON, an NVFP4 training recipe integrating HCP with post-QK operation protection. On GLA-1.3B model trained for 60B tokens, CHON reduces the loss gap to BF16 from 0.94% to 0.58% while maintaining downstream accuracy.
LGAug 4, 2025
Physics-Embedded Neural ODEs for Sim2Real Edge Digital Twins of Hybrid Power Electronics SystemsJialin Zheng, Haoyu Wang, Yangbin Zeng et al.
Edge Digital Twins (EDTs) are crucial for monitoring and control of Power Electronics Systems (PES). However, existing modeling approaches struggle to consistently capture continuously evolving hybrid dynamics that are inherent in PES, degrading Sim-to-Real generalization on resource-constrained edge devices. To address these challenges, this paper proposes a Physics-Embedded Neural ODEs (PENODE) that (i) embeds the hybrid operating mechanism as an event automaton to explicitly govern discrete switching and (ii) injects known governing ODE components directly into the neural parameterization of unmodeled dynamics. This unified design yields a differentiable end-to-end trainable architecture that preserves physical interpretability while reducing redundancy, and it supports a cloud-to-edge toolchain for efficient FPGA deployment. Experimental results demonstrate that PENODE achieves significantly higher accuracy in benchmarks in white-box, gray-box, and black-box scenarios, with a 75% reduction in neuron count, validating that the proposed PENODE maintains physical interpretability, efficient edge deployment, and real-time control enhancement.
SYJul 3, 2025
Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid DynamicsJialin Zheng, Haoyu Wang, Yangbin Zeng et al.
Advancing the dynamics inference of power electronic systems (PES) to the real-time edge-side holds transform-ative potential for testing, control, and monitoring. How-ever, efficiently inferring the inherent hybrid continu-ous-discrete dynamics on resource-constrained edge hardware remains a significant challenge. This letter pro-poses a neural substitute solver (NSS) approach, which is a neural-network-based framework aimed at rapid accurate inference with significantly reduced computational costs. Specifically, NSS leverages lightweight neural networks to substitute time-consuming matrix operation and high-order numerical integration steps in traditional solvers, which transforms sequential bottlenecks into highly parallel operation suitable for edge hardware. Experimental vali-dation on a multi-stage DC-DC converter demonstrates that NSS achieves 23x speedup and 60% hardware resource reduction compared to traditional solvers, paving the way for deploying edge inference of high-fidelity PES dynamics.