Jiaqi Ruan

3papers

59citations

Novelty35%

AI Score44

Ranked #74,825 of 205,806 authors (top 36%)#4,563 in AI (top 32%)

3 Papers

14.5SYJun 3

Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination

Yugui Liu, Yibo Ding, Xudong Li et al.

Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility. In this paper, a bi-level computation-electricity coordination framework is proposed to explicitly capture the bidirectional interactions between DCs and power grid. Firstly, a peer-to-peer cloud service market (P2P-CSM) for geo-distributed DCs is proposed, which enables bilateral cloud service transactions to leverage regional heterogeneities (e.g., electricity prices, cooling efficiency). Secondly, locational marginal prices are embedded into the framework to reflect network congestion and nodal price disparities. Thirdly, a dual consensus alternating direction method of multipliers (ADMM)-based decentralized algorithm is developed as the P2P market clearing algorithm, and a bisection-assisted iterative algorithm is proposed to ensure rigorous convergence of the framework. Case studies conducted on modified IEEE 30-bus system validate that the P2P-CSM achieves a win-win computation-electricity coordination: it not only increases total DC operational profit by 22.8\%, but also effectively alleviates grid congestion and yields a 3.2\% reduction in total energy consumption.

AINov 22, 2023

Applying Large Language Models to Power Systems: Potential Security Threats

Jiaqi Ruan, Gaoqi Liang, Huan Zhao et al.

Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and development of countermeasures.

LGNov 18, 2025Code

CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design

Jiawei Yi, Ping Gong, Youhui Bai et al.

The growth of million-token LLMs exposes the scalability limits of inference systems, where the KVCache dominates memory usage and data transfer overhead. Recent offloading systems migrate the KVCache to CPU memory and incorporate top-k attention to reduce the volume of data transferred from the CPU, while further applying system-level optimizations such as on-GPU caching and prefetching to lower transfer overhead. However, they overlook the CPU bottleneck in three aspects: (1) substantial overhead of fine-grained dynamic cache management performed on the CPU side, (2) significant transfer overhead from poor PCIe bandwidth utilization caused by heavy gathering operations at the CPU side, and (3) GPU runtime bubbles introduced by coarse-grained CPU-centric synchronization. To address these challenges, we propose CLO, a CPU-light KVCache offloading system via algorithm-system co-design. CLO features: (1) a coarse-grained head-wise approximate on-GPU caching strategy with negligible cache management cost, (2) seamless combination of data prefetching and on-GPU persistent caching for lower transfer overhead, (3) a zero-copy transfer engine to fully exploit PCIe bandwidth, and a GPU-centric synchronization method to eliminate GPU stalls. Evaluation on two widely-used LLMs demonstrates that CLO achieves comparable accuracy to state-of-the-art systems, while substantially minimizing CPU overhead, fully utilizing PCIe bandwidth, thus improving decoding throughput by 9.3%-66.6%. Our results highlight that algorithm-system co-design is essential for memory-constrained LLM inference on modern GPU platforms. We open source CLO at https://github.com/CommediaJW/CLO.