DBAIAug 29, 2025

EPIC: Generative AI Platform for Accelerating HPC Operational Data Analytics

arXiv:2509.16212v1h-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the need for dynamic and adaptable analytics in HPC operations, offering a domain-specific solution that is incremental in combining existing methods.

The authors tackled the problem of static and inflexible HPC operational data analytics by developing EPIC, an AI-driven platform with a hierarchical multi-agent architecture, which achieved up to 26% higher accuracy in descriptive analytics and 19x cost savings in LLM operations compared to proprietary solutions.

We present EPIC, an AI-driven platform designed to augment operational data analytics. EPIC employs a hierarchical multi-agent architecture where a top-level large language model provides query processing, reasoning and synthesis capabilities. These capabilities orchestrate three specialized low-level agents for information retrieval, descriptive analytics, and predictive analytics. This architecture enables EPIC to perform HPC operational analytics on multi-modal data, including text, images, and tabular formats, dynamically and iteratively. EPIC addresses the limitations of existing HPC operational analytics approaches, which rely on static methods that struggle to adapt to evolving analytics tasks and stakeholder demands. Through extensive evaluations on the Frontier HPC system, we demonstrate that EPIC effectively handles complex queries. Using descriptive analytics as a use case, fine-tuned smaller models outperform large state-of-the-art foundation models, achieving up to 26% higher accuracy. Additionally, we achieved 19x savings in LLM operational costs compared to proprietary solutions by employing a hybrid approach that combines large foundational models with fine-tuned local open-weight models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes