Amal Gueroudji

h-index4

4papers

53citations

Novelty52%

AI Score42

Ranked #59,253 of 194,257 authors (top 31%)#266 in DC (top 27%)

4 Papers

2.3MANov 11, 2025

Who Gets the Reward, Who Gets the Blame? Evaluation-Aligned Training Signals for Multi-LLM Agents

Chih-Hsuan Yang, Tanwi Mallick, Le Chen et al.

Large Language Models (LLMs) in multi-agent systems (MAS) have shown promise for complex tasks, yet current training methods lack principled ways to connect system-level evaluation with agent-level and message-level learning. We propose a theoretical framework that unifies cooperative game-theoretic attribution with process reward modeling to transform system evaluation into agent credit and then into response-level signals. Unlike prior approaches that rely only on attribution (e.g., Shapley) or step-level labels (e.g., PRM), our method produces local, signed, and credit-conserving signals. In success cases, Shapley-based credit assignment fairly allocates outcomes across agents and is refined into per-message rewards that promote cooperation while discouraging redundancy or sabotage. In failure cases, first-error localization yields repair-aware preferences that penalize harmful steps while rewarding corrective attempts. The resulting signals are bounded, cooperative, and directly compatible with reinforcement-based or preference-based post-training, providing a unified and auditable pathway from global evaluation to local supervision in LLM multi-agent training. Our contribution is conceptual: we present a theoretical foundation and training signals, leaving empirical validation for future work.

5.9DCSep 17, 2025Code

LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology

Renan Souza, Timothy Poteet, Brian Etz et al.

Modern scientific discovery increasingly relies on workflows that process data across the Edge, Cloud, and High Performance Computing (HPC) continuum. Comprehensive and in-depth analyses of these data are critical for hypothesis validation, anomaly detection, reproducibility, and impactful findings. Although workflow provenance techniques support such analyses, at large scale, the provenance data become complex and difficult to analyze. Existing systems depend on custom scripts, structured queries, or static dashboards, limiting data interaction. In this work, we introduce an evaluation methodology, reference architecture, and open-source implementation that leverages interactive Large Language Model (LLM) agents for runtime data analysis. Our approach uses a lightweight, metadata-driven design that translates natural language into structured provenance queries. Evaluations across LLaMA, GPT, Gemini, and Claude, covering diverse query classes and a real-world chemistry workflow, show that modular design, prompt tuning, and Retrieval-Augmented Generation (RAG) enable accurate and insightful LLM agent responses beyond recorded provenance.

4.1DCJun 29

StreamGuard: Low-Overhead Resilience for Real-time HPC Data Streams

Hai Duc Nguyen, Bogdan Nicolae, Tekin Bicer et al.

Real-time scientific workflows operate on continuous data streams and must produce timely, high-quality results despite executing on complex, failure-prone infrastructure. Hardware faults, network disruptions, and performance anomalies caused by resource contention or system heterogeneity can severely degrade performance and violate real-time constraints. We focus on strengthening the resilience of the producer-consumer streaming pattern, a fundamental building block of scientific streaming workflows. We present two complementary techniques: (i) a dynamic, asynchronous, non-blocking checkpointing mechanism that preserves progress without interrupting computation, and (ii) a progress-aware load redistribution strategy that detects slow workers and proactively rebalances tasks. Together, these mechanisms maintain forward progress and balanced execution even in highly error-prone environments. Experimental results show that our approach reduces the impact of failures and performance anomalies by up to 6x, while introducing less than 1% overhead in failure-free execution.

1.2DCOct 21, 2025

A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces

Ankur Lahiry, Ayush Pokharel, Banooqa Banday et al.

Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance analysis both computationally expensive and time-consuming. To address this challenge, we present an end-to-end parallel performance analysis framework designed to handle multiple large-scale GPU traces efficiently. Our proposed framework partitions and processes trace data concurrently and employs causal graph methods and parallel coordinating chart to expose performance variability and dependencies across execution flows. Experimental results demonstrate a 67% improvement in terms of scalability, highlighting the effectiveness of our pipeline for analyzing multiple traces independently.