Amal Gueroudji

DC
h-index12
4papers
9citations
Novelty55%
AI Score45

4 Papers

MANov 11, 2025
Who Gets the Reward, Who Gets the Blame? Evaluation-Aligned Training Signals for Multi-LLM Agents

Chih-Hsuan Yang, Tanwi Mallick, Le Chen et al.

Large Language Models (LLMs) in multi-agent systems (MAS) have shown promise for complex tasks, yet current training methods lack principled ways to connect system-level evaluation with agent-level and message-level learning. We propose a theoretical framework that unifies cooperative game-theoretic attribution with process reward modeling to transform system evaluation into agent credit and then into response-level signals. Unlike prior approaches that rely only on attribution (e.g., Shapley) or step-level labels (e.g., PRM), our method produces local, signed, and credit-conserving signals. In success cases, Shapley-based credit assignment fairly allocates outcomes across agents and is refined into per-message rewards that promote cooperation while discouraging redundancy or sabotage. In failure cases, first-error localization yields repair-aware preferences that penalize harmful steps while rewarding corrective attempts. The resulting signals are bounded, cooperative, and directly compatible with reinforcement-based or preference-based post-training, providing a unified and auditable pathway from global evaluation to local supervision in LLM multi-agent training. Our contribution is conceptual: we present a theoretical foundation and training signals, leaving empirical validation for future work.

DCSep 17, 2025Code
LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology

Renan Souza, Timothy Poteet, Brian Etz et al.

Modern scientific discovery increasingly relies on workflows that process data across the Edge, Cloud, and High Performance Computing (HPC) continuum. Comprehensive and in-depth analyses of these data are critical for hypothesis validation, anomaly detection, reproducibility, and impactful findings. Although workflow provenance techniques support such analyses, at large scale, the provenance data become complex and difficult to analyze. Existing systems depend on custom scripts, structured queries, or static dashboards, limiting data interaction. In this work, we introduce an evaluation methodology, reference architecture, and open-source implementation that leverages interactive Large Language Model (LLM) agents for runtime data analysis. Our approach uses a lightweight, metadata-driven design that translates natural language into structured provenance queries. Evaluations across LLaMA, GPT, Gemini, and Claude, covering diverse query classes and a real-world chemistry workflow, show that modular design, prompt tuning, and Retrieval-Augmented Generation (RAG) enable accurate and insightful LLM agent responses beyond recorded provenance.

DCJul 15, 2025
PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Seth Ockerman, Amal Gueroudji, Tanwi Mallick et al.

Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. However, their applications have been limited primarily to small-scale datasets because of memory constraints. While distributed training offers a solution, current frameworks lack support for spatiotemporal models and overlook the properties of spatiotemporal data. Informed by a scaling study on a large-scale workload, we present PyTorch Geometric Temporal Index (PGT-I), an extension to PyTorch Geometric Temporal that integrates distributed data parallel training and two novel strategies: index-batching and distributed-index-batching. Our index techniques exploit spatiotemporal structure to construct snapshots dynamically at runtime, significantly reducing memory overhead, while distributed-index-batching extends this approach by enabling scalable processing across multiple GPUs. Our techniques enable the first-ever training of an ST-GNN on the entire PeMS dataset without graph partitioning, reducing peak memory usage by up to 89% and achieving up to a 11.78x speedup over standard DDP with 128 GPUs.

DCOct 21, 2025
A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces

Ankur Lahiry, Ayush Pokharel, Banooqa Banday et al.

Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance analysis both computationally expensive and time-consuming. To address this challenge, we present an end-to-end parallel performance analysis framework designed to handle multiple large-scale GPU traces efficiently. Our proposed framework partitions and processes trace data concurrently and employs causal graph methods and parallel coordinating chart to expose performance variability and dependencies across execution flows. Experimental results demonstrate a 67% improvement in terms of scalability, highlighting the effectiveness of our pipeline for analyzing multiple traces independently.