LGJul 30, 2024
Robust Load Prediction of Power Network Clusters Based on Cloud-Model-Improved TransformerCheng Jiang, Gang Lu, Xue Ma et al.
Load data from power network clusters indicates economic development in each area, crucial for predicting regional trends and guiding power enterprise decisions. The Transformer model, a leading method for load prediction, faces challenges modeling historical data due to variables like weather, events, festivals, and data volatility. To tackle this, the cloud model's fuzzy feature is utilized to manage uncertainties effectively. Presenting an innovative approach, the Cloud Model Improved Transformer (CMIT) method integrates the Transformer model with the cloud model utilizing the particle swarm optimization algorithm, with the aim of achieving robust and precise power load predictions. Through comparative experiments conducted on 31 real datasets within a power network cluster, it is demonstrated that CMIT significantly surpasses the Transformer model in terms of prediction accuracy, thereby highlighting its effectiveness in enhancing forecasting capabilities within the power network cluster sector.
LGAug 25, 2025
Adaptive Ensemble Learning with Gaussian Copula for Load ForecastingJunying Yang, Gang Lu, Xiaoqing Yan et al.
Machine learning (ML) is capable of accurate Load Forecasting from complete data. However, there are many uncertainties that affect data collection, leading to sparsity. This article proposed a model called Adaptive Ensemble Learning with Gaussian Copula to deal with sparsity, which contains three modules: data complementation, ML construction, and adaptive ensemble. First, it applies Gaussian Copula to eliminate sparsity. Then, we utilise five ML models to make predictions individually. Finally, it employs adaptive ensemble to get final weighted-sum result. Experiments have demonstrated that our model are robust.
DCJun 7, 2024
Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication OptimizationJianbo Dong, Bin Luo, Jun Zhang et al.
The emergence of Large Language Models (LLMs) has necessitated the adoption of distributed training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, the efficiency of large-scale distributed training systems is often suboptimal due to the increased likelihood of hardware errors in high-end GPU products and the heightened risk of network traffic collisions. Moreover, any local hardware failure can disrupt training tasks, and the inability to swiftly identify faulty components leads to a significant waste of GPU resources. And, prolonged communication due to traffic collisions can substantially increase GPU waiting times. To address these challenges, we propose a communication-driven solution, namely the C4. The key insights of C4 are twofold. First, the load in distributed training exhibits homogeneous characteristics and is divided into iterations through periodic synchronization, therefore hardware anomalies would incur certain syndrome in collective communication. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving a limited number of long-lived flows, allows C4 to efficiently execute traffic planning, substantially reducing bandwidth competition among these flows. The C4 has been extensively deployed across real-world production systems in a hyperscale cloud provider, yielding a significant improvement in system efficiency, from 30% to 45%. This enhancement is attributed to a 30% reduction in error-induced overhead and a 15% reduction in communication costs.
PFFeb 17, 2020
AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark SuiteWanling Gao, Fei Tang, Jianfeng Zhan et al.
Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{http://www.benchcouncil.org/AIBench/index.html}.