40.9AIJun 4
Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM DistillatioFangbo Tu, Junhua Zhao, Chi Liu et al.
Demand for low-precision inference, including NVFP4-based approaches, has grown as large language models are increasingly deployed in latency and cost constrained production environments. Quantization-aware distillation (QAD) helps recover accuracy lost under low bit quantization by training a quantized student to match the output distribution of a frozen higher precision teacher via a KL-divergence loss. In this work, we first provide a representation level diagnosis of QAD: output matching alone can mask internal degradation, because many intermediate activation geometries can yield similar teacher-aligned logits. Using CKA, we show that KL-only QAD can reduce layerwise representational similarity relative to the BF16 teacher, with especially severe drift in RL-post-trained models. This drift correlates with downstream bottlenecks on reasoning and coding tasks, suggesting that low bit recovery requires preserving internal geometry rather than matching outputs alone. Motivated by this finding, we propose \textbf{CKA-QAD}, a CKA-guided representational alignment method for NVFP4 QAD and low bit LLM accuracy recovery. The method adds a lightweight regularizer that preserves internal representational geometry during distillation by aligning layerwise Gram matrices through CKA. Across Nemotron 3 Nano and Qwen3-4B-Thinking-2507, CKA-QAD substantially improves representational alignment and improves downstream reasoning and coding accuracy with modest training overhead. Our findings position CKA-guided representational alignment as a practical complement to output matching for quantized LLM recovery.
AISep 26, 2024
From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with ReflectionXinlei Wang, Maike Feng, Jing Qiu et al.
This paper introduces a novel approach that leverages Large Language Models (LLMs) and Generative Agents to enhance time series forecasting by reasoning across both text and time series data. With language as a medium, our method adaptively integrates social events into forecasting models, aligning news content with time series fluctuations to provide richer insights. Specifically, we utilize LLM-based agents to iteratively filter out irrelevant news and employ human-like reasoning to evaluate predictions. This enables the model to analyze complex events, such as unexpected incidents and shifts in social behavior, and continuously refine the selection logic of news and the robustness of the agent's output. By integrating selected news events with time series data, we fine-tune a pre-trained LLM to predict sequences of digits in time series. The results demonstrate significant improvements in forecasting accuracy, suggesting a potential paradigm shift in time series forecasting through the effective utilization of unstructured news data.
AIJul 7, 2024
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language ModelsXiyuan Zhou, Huan Zhao, Yuheng Cheng et al.
In response to the urgent demand for grid stability and the complex challenges posed by renewable energy integration and electricity market dynamics, the power sector increasingly seeks innovative technological solutions. In this context, large language models (LLMs) have become a key technology to improve efficiency and promote intelligent progress in the power sector with their excellent natural language processing, logical reasoning, and generalization capabilities. Despite their potential, the absence of a performance evaluation benchmark for LLM in the power sector has limited the effective application of these technologies. Addressing this gap, our study introduces "ElecBench", an evaluation benchmark of LLMs within the power sector. ElecBench aims to overcome the shortcomings of existing evaluation benchmarks by providing comprehensive coverage of sector-specific scenarios, deepening the testing of professional knowledge, and enhancing decision-making precision. The framework categorizes scenarios into general knowledge and professional business, further divided into six core performance metrics: factuality, logicality, stability, security, fairness, and expressiveness, and is subdivided into 24 sub-metrics, offering profound insights into the capabilities and limitations of LLM applications in the power sector. To ensure transparency, we have made the complete test set public, evaluating the performance of eight LLMs across various scenarios and metrics. ElecBench aspires to serve as the standard benchmark for LLM applications in the power sector, supporting continuous updates of scenarios, metrics, and models to drive technological progress and application.
96.9CYMay 2Code
Hugging Carbon: Quantifying the Training Carbon Emissions of AI Models at ScaleXinlei Wang, Ruibo Ming, Jing Qiu et al.
The scaling-law era has transformed artificial intelligence from research into a global industry, but its rapid growth raises concerns over energy usage, carbon emissions, and environmental sustainability. Unlike traditional sectors, the AI industry still lacks systematic carbon accounting methods that support large-scale estimates without reproducing the original model. This leaves open questions about how large the problem is today and how large it might be in the near future. Given that the Hugging Face (HF) platform well represents the broader open-source community, we treat it as a large-scale, publicly accessible, and audit-ready corpus for carbon accounting. We propose a FLOPs-based framework to estimate aggregate training emissions of HF open-source models. Considering their uneven disclosure quality, we introduce a tiered approach to handle incomplete metadata, supported by empirical regressions that verify the statistical significance. Compute is also converted to AI training carbon intensity (ATCI, emissions per compute), a metric to assess the sustainability efficiency of model training. Our results show that training the most popular open-source models (with over 5,000 downloads) has resulted in approximately $5.8\times10^4$ metric tons of carbon emissions. This paper provides a scalable framework for emission estimations and a practical methodology to guide future standards and sustainability strategies in the AI industry.
93.3AIMay 4Code
EngiAgent: Fully Connected Coordination of LLM Agents for Solving Open-ended Engineering Problems with Feasible SolutionsXiyuan Zhou, Ruixi Zou, Xinlei Wang et al.
Engineering problem solving is central to real-world decision-making, requiring mathematical formulations that not only represent complex problems but also produce feasible solutions under data and physical constraints. Unlike mathematical problem solving, which operates on predefined formulations, engineering tasks demand open-ended analysis, feasibility-driven modeling, and iterative refinement. Although large language models (LLMs) have shown strong capabilities in reasoning and code generation, they often fail to ensure feasibility, which limits their applicability to engineering problem solving. To address this challenge, we propose EngiAgent, a multi-agent system with a fully connected coordinator that simulates expert workflows through specialized agents for problem analysis, modeling, verification, solving, and solution evaluation. The fully connected coordinator enables flexible feedback routing, overcoming the rigidity of prior pipeline-based reflection methods and ensuring feasibility at every stage of the process. This design not only improves robustness to diverse failure cases such as data extraction errors, constraint inconsistencies, and solver failures, but also enhances the overall quality of problem solving. Empirical results across four representative domains demonstrate that EngiAgent achieves substantial improvements in feasibility compared to prior approaches, establishing a new paradigm for feasibility-oriented engineering problem solving with LLMs. Our source code and data are available at https://github.com/AI4Engi/EngiAgent.
CLDec 29, 2024Code
Multi-Objective Large Language Model UnlearningZibin Pan, Shuwen Zhang, Yuesheng Zheng et al.
Machine unlearning in the domain of large language models (LLMs) has attracted great attention recently, which aims to effectively eliminate undesirable behaviors from LLMs without full retraining from scratch. In this paper, we explore the Gradient Ascent (GA) approach in LLM unlearning, which is a proactive way to decrease the prediction probability of the model on the target data in order to remove their influence. We analyze two challenges that render the process impractical: gradient explosion and catastrophic forgetting. To address these issues, we propose Multi-Objective Large Language Model Unlearning (MOLLM) algorithm. We first formulate LLM unlearning as a multi-objective optimization problem, in which the cross-entropy loss is modified to the unlearning version to overcome the gradient explosion issue. A common descent update direction is then calculated, which enables the model to forget the target data while preserving the utility of the LLM. Our empirical results verify that MoLLM outperforms the SOTA GA-based LLM unlearning methods in terms of unlearning effect and model utility preservation. The source code is available at https://github.com/zibinpan/MOLLM.
AISep 22, 2025Code
EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem SolvingXiyuan Zhou, Xinlei Wang, Yirui He et al.
Large language models (LLMs) have shown strong performance on mathematical reasoning under well-posed conditions. However, real-world engineering problems require more than mathematical symbolic computation -- they need to deal with uncertainty, context, and open-ended scenarios. Existing benchmarks fail to capture these complexities. We introduce EngiBench, a hierarchical benchmark designed to evaluate LLMs on solving engineering problems. It spans three levels of increasing difficulty (foundational knowledge retrieval, multi-step contextual reasoning, and open-ended modeling) and covers diverse engineering subfields. To facilitate a deeper understanding of model performance, we systematically rewrite each problem into three controlled variants (perturbed, knowledge-enhanced, and math abstraction), enabling us to separately evaluate the model's robustness, domain-specific knowledge, and mathematical reasoning abilities. Experiment results reveal a clear performance gap across levels: models struggle more as tasks get harder, perform worse when problems are slightly changed, and fall far behind human experts on the high-level engineering tasks. These findings reveal that current LLMs still lack the high-level reasoning needed for real-world engineering, highlighting the need for future models with deeper and more reliable problem-solving capabilities. Our source code and data are available at https://github.com/EngiBench/EngiBench.
94.2SYMay 13
Battery-Assisted Operation of Hyperscale AI Data Centers under Connect-and-Manage Interconnection PracticesXin Lu, Jing Qiu, Jiafeng Lin et al.
Emerging connect-and-manage practices allow new transmission-connected mega-loads to connect while enforcing time-varying admissible power exchange limits at the point of common coupling (PCC) in real time. Hyperscale artificial intelligence data centers (AIDCs), whose demand can reach hundreds of megawatts and whose internal computing-cooling dynamics evolve rapidly, can therefore face frequent conflicts between workload continuity requirements and externally imposed PCC envelopes. This paper proposes a battery-assisted operational framework in which on-site battery energy storage (BESS) serves as a physical buffering interface to reconcile fast internal dynamics with time-varying interconnection limits. A continuity-aware energy-computation model is developed to jointly capture checkpoint-constrained AI training workloads, information technology (IT) computing power-throughput characteristics, and IT-cooling thermal dynamics. A two-stage decision framework is then formulated, consisting of scenario-based day-ahead workload commitment and a real-time receding-horizon delivery assurance controller that enforces battery, thermal, and grid-interaction constraints. Case studies on the IEEE 39-bus system with Australian real data demonstrate that BESS substantially increases credible day-ahead workload commitment and improves real-time delivery robustness under transmission congestion. Sensitivity analyses further reveal a regime-dependent role transition of BESS -- from feasibility-oriented continuity support when PCC limits are binding to economy-driven flexibility provision as transmission constraints are relaxed.
AIJan 7, 2024
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and ProspectsYuheng Cheng, Ceyao Zhang, Zhengwen Zhang et al. · pku
Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from serving as autonomous general-purpose task assistants to applications in coding, social, and economic domains, LLM-based agents offer extensive exploration opportunities. This paper surveys current research to provide an in-depth overview of LLM-based intelligent agents within single-agent and multi-agent systems. It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback. We also delve into the mechanisms of deploying LLM-based agents in multi-agent systems, including multi-role collaboration, message passing, and strategies to alleviate communication issues between agents. The discussions also shed light on popular datasets and application scenarios. We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.
LGMar 30, 2024
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and MethodsYuji Cao, Huan Zhao, Yuheng Cheng et al.
With extensive pre-trained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared to conventional RL methods, aiming to clarify the research scope and directions for future studies. Utilizing the classical agent-environment interaction paradigm, we propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator. For each role, we summarize the methodologies, analyze the specific RL challenges that are mitigated, and provide insights into future directions. Lastly, a comparative analysis of each role, potential applications, prospective opportunities, and challenges of the LLM-enhanced RL are discussed. By proposing this taxonomy, we aim to provide a framework for researchers to effectively leverage LLMs in the RL field, potentially accelerating RL applications in complex applications such as robotics, autonomous driving, and energy systems.
AO-PHOct 11, 2022
Near Real-time CO$_2$ Emissions Based on Carbon Satellite and Artificial IntelligenceZhengwen Zhang, Jinjin Gu, Junhua Zhao et al.
To limit global warming to pre-industrial levels, global governments, industry and academia are taking aggressive efforts to reduce carbon emissions. The evaluation of anthropogenic carbon dioxide (CO$_2$) emissions, however, depends on the self-reporting information that is not always reliable. Society need to develop an objective, independent, and generalized system to meter CO$_2$ emissions. Satellite CO$_2$ observation from space that reports column-average regional CO$_2$ dry-air mole fractions has gradually indicated its potential to build such a system. Nevertheless, estimating anthropogenic CO$_2$ emissions from CO$_2$ observing satellite is bottlenecked by the influence of the highly complicated physical characteristics of atmospheric activities. Here we provide the first method that combines the advanced artificial intelligence (AI) techniques and the carbon satellite monitor to quantify anthropogenic CO$_2$ emissions. We propose an integral AI based pipeline that contains both a data retrieval algorithm and a two-step data-driven solution. First, the data retrieval algorithm can generate effective datasets from multi-modal data including carbon satellite, the information of carbon sources, and several environmental factors. Second, the two-step data-driven solution that applies the powerful representation of deep learning techniques to learn to quantify anthropogenic CO$_2$ emissions from satellite CO$_2$ observation with other factors. Our work unmasks the potential of quantifying CO$_2$ emissions based on the combination of deep learning algorithms and the carbon satellite monitor.
LGDec 28, 2024
Federated Unlearning with Gradient Descent and Conflict MitigationZibin Pan, Zhichao Wang, Chi Li et al.
Federated Learning (FL) has received much attention in recent years. However, although clients are not required to share their data in FL, the global model itself can implicitly remember clients' local data. Therefore, it's necessary to effectively remove the target client's data from the FL global model to ease the risk of privacy leakage and implement ``the right to be forgotten". Federated Unlearning (FU) has been considered a promising way to remove data without full retraining. But the model utility easily suffers significant reduction during unlearning due to the gradient conflicts. Furthermore, when conducting the post-training to recover the model utility, the model is prone to move back and revert what has already been unlearned. To address these issues, we propose Federated Unlearning with Orthogonal Steepest Descent (FedOSD). We first design an unlearning Cross-Entropy loss to overcome the convergence issue of the gradient ascent. A steepest descent direction for unlearning is then calculated in the condition of being non-conflicting with other clients' gradients and closest to the target client's gradient. This benefits to efficiently unlearn and mitigate the model utility reduction. After unlearning, we recover the model utility by maintaining the achievement of unlearning. Finally, extensive experiments in several FL scenarios verify that FedOSD outperforms the SOTA FU algorithms in terms of unlearning and model utility.
SYDec 17, 2024
Coordinated Power Smoothing Control for Wind Storage Integrated System with Physics-informed Deep Reinforcement LearningShuyi Wang, Huan Zhao, Yuji Cao et al.
The Wind Storage Integrated System with Power Smoothing Control (PSC) has emerged as a promising solution to ensure both efficient and reliable wind energy generation. However, existing PSC strategies overlook the intricate interplay and distinct control frequencies between batteries and wind turbines, and lack consideration of wake effect and battery degradation cost. In this paper, a novel coordinated control framework with hierarchical levels is devised to address these challenges effectively, which integrates the wake model and battery degradation model. In addition, after reformulating the problem as a Markov decision process, the multi-agent reinforcement learning method is introduced to overcome the bi-level characteristic of the problem. Moreover, a Physics-informed Neural Network-assisted Multi-agent Deep Deterministic Policy Gradient (PAMA-DDPG) algorithm is proposed to incorporate the power fluctuation differential equation and expedite the learning process. The effectiveness of the proposed methodology is evaluated through simulations conducted in four distinct scenarios using WindFarmSimulator (WFSim). The results demonstrate that the proposed algorithm facilitates approximately an 11% increase in total profit and a 19% decrease in power fluctuation compared to the traditional methods, thereby addressing the dual objectives of economic efficiency and grid-connected energy reliability.
AIOct 14, 2025
Toward Reasoning-Centric Time-Series AnalysisXinlei Wang, Mingtian Tan, Jing Qiu et al.
Traditional time series analysis has long relied on pattern recognition, trained on static and well-established benchmarks. However, in real-world settings -- where policies shift, human behavior adapts, and unexpected events unfold -- effective analysis must go beyond surface-level trends to uncover the actual forces driving them. The recent rise of Large Language Models (LLMs) presents new opportunities for rethinking time series analysis by integrating multimodal inputs. However, as the use of LLMs becomes popular, we must remain cautious, asking why we use LLMs and how to exploit them effectively. Most existing LLM-based methods still employ their numerical regression ability and ignore their deeper reasoning potential. This paper argues for rethinking time series with LLMs as a reasoning task that prioritizes causal structure and explainability. This shift brings time series analysis closer to human-aligned understanding, enabling transparent and context-aware insights in complex real-world environments.
LGMay 24, 2021
Fed-NILM: A Federated Learning-based Non-Intrusive Load Monitoring Method for Privacy-ProtectionHaijin Wang, Caomingzhe Si, Junhua Zhao et al.
Non-intrusive load monitoring (NILM) is essential for understanding customer's power consumption patterns and may find wide applications like carbon emission reduction and energy conservation. The training of NILM models requires massive load data containing different types of appliances. However, inadequate load data and the risk of power consumer privacy breaches may be encountered by local data owners during the NILM model training. To prevent such potential risks, a novel NILM method named Fed-NILM which is based on Federated Learning (FL) is proposed in this paper. In Fed-NILM, local model parameters instead of local load data are shared among multiple data owners. The global model is obtained by weighted averaging the parameters. Experiments based on two measured load datasets are conducted to explore the generalization ability of Fed-NILM. Besides, a comparison of Fed-NILM with locally-trained NILMs and the centrally-trained NILM is conducted. The experimental results show that Fed-NILM has superior performance in scalability and convergence. Fed-NILM outperforms locally-trained NILMs operated by local data owners and approximates the centrally-trained NILM which is trained on the entire load dataset without privacy protection. The proposed Fed-NILM significantly improves the co-modeling capabilities of local data owners while protecting power consumers' privacy.
SPApr 4, 2021
A Federated Learning Framework for Non-Intrusive Load MonitoringHaijin Wang, Caomingzhe Si, Junhua Zhao
Non-intrusive load monitoring (NILM) aims at decomposing the total reading of the household power consumption into appliance-wise ones, which is beneficial for consumer behavior analysis as well as energy conservation. NILM based on deep learning has been a focus of research. To train a better neural network, it is necessary for the network to be fed with massive data containing various appliances and reflecting consumer behavior habits. Therefore, data cooperation among utilities and DNOs (distributed network operators) who own the NILM data has been increasingly significant. During the cooperation, however, risks of consumer privacy leakage and losses of data control rights arise. To deal with the problems above, a framework to improve the performance of NILM with federated learning (FL) has been set up. In the framework, model weights instead of the local data are shared among utilities. The global model is generated by weighted averaging the locally-trained model weights to gather the locally-trained model information. Optimal model selection help choose the model which adapts to the data from different domains best. Experiments show that this proposal improves the performance of local NILM runners. The performance of this framework is close to that of the centrally-trained model obtained by the convergent data without privacy protection.
SPSep 6, 2018
Super-Resolution Perception for Industrial Sensor DataJinjin Gu, Haoyu Chen, Guolong Liu et al.
In this paper, we present the problem formulation and methodology framework of Super-Resolution Perception (SRP) on industrial sensor data. Industrial intelligence relies on high-quality industrial sensor data for system control, diagnosis, fault detection, identification, and monitoring. However, the provision of high-quality data may be expensive in some cases. In this paper, we propose a novel machine learning problem -- the SRP problem as reconstructing high-quality data from unsatisfactory sensor data in industrial systems. Advanced generative models are then proposed to solve the SRP problem. This technology makes it possible to empower existing industrial facilities without upgrading existing sensors or deploying additional sensors. We first mathematically formulate the SRP problem under the Maximum a Posteriori (MAP) estimation framework. A case study is then presented, which performs SRP on smart meter data. A network, namely SRPNet, is proposed to generate high-frequency load data from low-frequency data. We further employ a novel recognition-based loss and relativistic adversarial loss to constraint the reconstruction of waveforms explicitly. Experiments demonstrate that our SRP model can reconstruct high-frequency data effectively. Moreover, the reconstructed high-frequency data can lead to better appliance monitoring results without changing the monitoring appliances.