A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
This addresses scalability challenges in geoscientific data reuse for researchers, though it appears incremental as an enhancement to existing LLM-based methods.
The authors tackled the problem of underutilized Earth science data archives by developing PANGAEA-GPT, a hierarchical multi-agent framework for autonomous data discovery and analysis, which demonstrated the capacity to execute complex, multi-step workflows with minimal human intervention.
The rapid accumulation of Earth science data has created a significant scalability challenge; while repositories like PANGAEA host vast collections of datasets, citation metrics indicate that a substantial portion remains underutilized, limiting data reusability. Here we present PANGAEA-GPT, a hierarchical multi-agent framework designed for autonomous data discovery and analysis. Unlike standard Large Language Model (LLM) wrappers, our architecture implements a centralized Supervisor-Worker topology with strict data-type-aware routing, sandboxed deterministic code execution, and self-correction via execution feedback, enabling agents to diagnose and resolve runtime errors. Through use-case scenarios spanning physical oceanography and ecology, we demonstrate the system's capacity to execute complex, multi-step workflows with minimal human intervention. This framework provides a methodology for querying and analyzing heterogeneous repository data through coordinated agent workflows.