DC AI DBSep 17, 2025

LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology

Renan Souza, Timothy Poteet, Brian Etz, Daniel Rosendo, Amal Gueroudji, Woong Shin, Prasanna Balaprakash, Rafael Ferreira da Silva

arXiv:2509.13978v25.96 citationsh-index: 12Has CodeSC25-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis

Originality Incremental advance

AI Analysis

This addresses the challenge for scientists and researchers dealing with large-scale scientific workflows by providing a more interactive and accessible way to analyze provenance data, though it is incremental as it builds on existing LLM and provenance techniques.

The paper tackles the problem of analyzing complex workflow provenance data at scale by introducing an evaluation methodology and reference architecture that uses interactive LLM agents for runtime data analysis, showing that modular design, prompt tuning, and RAG enable accurate and insightful responses across models like LLaMA, GPT, Gemini, and Claude in a real-world chemistry workflow.

Modern scientific discovery increasingly relies on workflows that process data across the Edge, Cloud, and High Performance Computing (HPC) continuum. Comprehensive and in-depth analyses of these data are critical for hypothesis validation, anomaly detection, reproducibility, and impactful findings. Although workflow provenance techniques support such analyses, at large scale, the provenance data become complex and difficult to analyze. Existing systems depend on custom scripts, structured queries, or static dashboards, limiting data interaction. In this work, we introduce an evaluation methodology, reference architecture, and open-source implementation that leverages interactive Large Language Model (LLM) agents for runtime data analysis. Our approach uses a lightweight, metadata-driven design that translates natural language into structured provenance queries. Evaluations across LLaMA, GPT, Gemini, and Claude, covering diverse query classes and a real-world chemistry workflow, show that modular design, prompt tuning, and Retrieval-Augmented Generation (RAG) enable accurate and insightful LLM agent responses beyond recorded provenance.

View on arXiv PDF

Similar