51.5LGMay 25
Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware TrainingYingying Cheng, Jinquan Shi, Li Zhou et al.
Quantization-aware training (QAT) with low-bit floating-point formats enables efficient LLM deployment, yet introduces subtle failure modes invisible to standard training metrics. We present a systematic study of HiF8 W8A8 QAT for OpenPangu-Embedded-1B through the lens of Delayed Tensor Scaling (DTS). Across eight controlled experiments, we identify and disentangle two orthogonal failure modes: (i)amax saturation, where delayed scale estimates silently corrupt knowledge-sensitive representations via forward-pass clipping, and (ii)catastrophic forgetting, where an aggressive learning rate overwrites pretrained commonsense knowledge independently of quantization. Neither is detectable from training loss alone. We address amax saturation with a conservative max-algorithm DTS strategy over a 64-step history window, and mitigate forgetting via a 500-step BF16 warmup followed by QAT at lr=10^{-5}. Both fixes are necessary and sufficient: our final configuration achieves 0.43% MMLU drop, 0.58% HellaSwag drop, and 0.22% ARC-Challenge drop versus a matched BF16 baseline, with a training loss APE of only 0.11% over 10,000 steps.
55.3SCMar 29
A Dataset of Nonlinear Equations for SubdivisionJuan Xu, Huilong Lai, Yingying Cheng et al.
In this paper, we report on the largest labelled dataset constructed so far for solving zero-dimensional square nonlinear systems with subdivision-based methods. A brief, non-exhaustive survey with emphasis on the literature from the past two decades is also provided to accompany with the dataset. The value of the dataset has been demonstrated through benchmarking several solvers as well as being used for learning to classify the real roots of nonlinear parametric systems.
71.4DCApr 27
SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point MethodsShaofeng Yang, Yunting Wang, Yingying Cheng et al.
The solution of sparse linear systems constitutes the dominant computational bottleneck in interior point methods (IPMs), frequently consuming over 70\% of the total solution time. As optimization problems scale to millions of variables, direct solvers encounter prohibitive fill-in, excessive memory consumption, and limited parallel scalability. We present SDSL-Solver, a scalable distributed sparse linear solver framework designed for IPMs. SDSL-Solver employs Krylov subspace methods, combined with numerics-based sparse filtering and diagonal correction techniques that produce high-quality preconditioners. To accommodate diverse problem characteristics, SDSL-Solver offers two complementary distributed parallel methods: Block Jacobi for well-conditioned, diagonally dominant systems, and Bordered Block Diagonal (BBD) for ill-conditioned problems requiring globally coupled preconditioning via Schur complement techniques. A preconditioner reuse strategy further amortizes construction costs across consecutive IPMs iterations. We evaluate SDSL-Solver on benchmark problems with matrix dimensions ranging from tens of thousands to over five million on multi-node clusters equipped with X86 processors. The experimental results show that under the Block Jacobi and BBD distributed methods, SDSL-Solver on a four-node configuration achieves average speedups of $6.23\times$ and $7.77\times$, respectively, compared to PETSc running on the same number of nodes. Relative to the single-node PARDISO, the average speedups reach $97.54\times$ and $5.85\times$, respectively.
AIAug 5, 2025
Compressing Chain-of-Thought in LLMs via Step EntropyZeju Li, Jianyuan Zhong, Ziyang Zheng et al.
Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting excel at complex reasoning but generate verbose thought processes with considerable redundancy, leading to increased inference costs and reduced efficiency. We introduce a novel CoT compression framework based on step entropy, a metric that quantifies the informational contribution of individual reasoning steps to identify redundancy. Through theoretical analysis and extensive empirical validation on mathematical reasoning benchmarks, we demonstrate that steps with low entropy are indeed highly redundant. Our experiments reveal that an astonishing 80\% of low-entropy intermediate steps can be pruned with minor degradation in the final answer accuracy across DeepSeek-R1-7B, 14B and Qwen3-8B. This finding sharply contrasts with random or high-entropy pruning, which severely impairs reasoning performance. Building on this, we propose a novel two-stage training strategy combining Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) reinforcement learning. This approach enables LLMs to autonomously learn to generate compressed COTs during inference by strategically incorporating [SKIP] tokens. Our method significantly enhances LLM inference efficiency while rigorously preserving accuracy, offering profound implications for practical LLM deployment and a deeper understanding of reasoning structures.
AIAug 18, 2025
GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and ComplianceJinquan Shi, Yingying Cheng, Fan Zhang et al.
The global shift towards renewable energy presents unprecedented challenges for the electricity industry, making regulatory reasoning and compliance increasingly vital. Grid codes, the regulations governing grid operations, are complex and often lack automated interpretation solutions, which hinders industry expansion and undermines profitability for electricity companies. We introduce GridCodex, an end to end framework for grid code reasoning and compliance that leverages large language models and retrieval-augmented generation (RAG). Our framework advances conventional RAG workflows through multi stage query refinement and enhanced retrieval with RAPTOR. We validate the effectiveness of GridCodex with comprehensive benchmarks, including automated answer assessment across multiple dimensions and regulatory agencies. Experimental results showcase a 26.4% improvement in answer quality and more than a 10 fold increase in recall rate. An ablation study further examines the impact of base model selection.