IMAIOct 14, 2025

InferA: A Smart Assistant for Cosmological Ensemble Data

arXiv:2510.12920v16 citationsh-index: 17SC25-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
Originality Incremental advance
AI Analysis

This addresses the problem of handling terabyte-scale scientific data analysis for researchers, though it appears incremental as it builds on existing automation tools and multi-agent concepts.

The authors tackled the challenge of analyzing large-scale scientific datasets by proposing InferA, a multi-agent system that leverages large language models to enable scalable and efficient data analysis, and demonstrated its usability on a cosmology simulation dataset of several terabytes.

Analyzing large-scale scientific datasets presents substantial challenges due to their sheer volume, structural complexity, and the need for specialized domain knowledge. Automation tools, such as PandasAI, typically require full data ingestion and lack context of the full data structure, making them impractical as intelligent data analysis assistants for datasets at the terabyte scale. To overcome these limitations, we propose InferA, a multi-agent system that leverages large language models to enable scalable and efficient scientific data analysis. At the core of the architecture is a supervisor agent that orchestrates a team of specialized agents responsible for distinct phases of the data retrieval and analysis. The system engages interactively with users to elicit their analytical intent and confirm query objectives, ensuring alignment between user goals and system actions. To demonstrate the framework's usability, we evaluate the system using ensemble runs from the HACC cosmology simulation which comprises several terabytes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes