DCMar 14

Calibrating Microgrid Simulations for Energy-Aware Computing Systems

arXiv:2604.096156.6h-index: 2

Predicted impact top 87% in DC · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of creating cost-effective and realistic testbeds for energy-aware computing systems, which is crucial for reducing the environmental impact of data centers, though it is incremental as it builds on existing frameworks like Vessim and Kepler.

The paper tackled the problem of developing realistic environments for carbon-aware computing by proposing a self-calibrating energy-aware software testbed that integrates renewable energy simulators with real computing nodes, resulting in improved accuracy for GPU workloads by ~50% and CPU workloads by ~3.5% after calibration.

The surge for computing resource demand is increasing global electricity consumption in data centers which is expected to exceed 1000 TWh by 2026, mainly attributable to adoption of new AI technologies. Carbon-aware computing strategies can mitigate their environmental impact by aligning power consumption with the production of low-carbon renewable energy, but they face challenges due to the scarcity of development environments. Existing solutions either rely on costly and complex physical system architectures that are difficult to integrate and maintain or on full simulations that, while more economical, often lack realism by ignoring system overheads, and real-time node power consumption and resource fluctuations. This thesis remediates these issues by proposing a self-calibrating energy-aware software testbed that uses the Software-in-the-Loop co-simulation framework Vessim to integrate renewable energy production simulators, while including real computing nodes. The application-level power consumption of these are first approximated by the Kepler framework and then calibrated within Vessim's microgrid simulation using an external socket power meter as a definitive measurement source on the system-level. The evaluation of the testbed with GPU and CPU intensive workloads reveal fairly accurate power approximation of the whole computing node by the Kepler framework, with an average regression coefficient of 1.01 and R^2 values of 0.95, though certain machine learning workloads showed higher deviation. The average static y-intercept of the regression line of ~5.23 W indicate inaccuracies in the idle power approximation. Calibration of dynamic per-process power consumption improved accuracy for GPU workloads by ~50%, while CPU workloads saw a modest improvement of ~3.5%.

View on arXiv PDF

Similar