SEAIMar 6

XAI for Coding Agent Failures: Transforming Raw Execution Traces into Actionable Insights

arXiv:2603.05941v1h-index: 2
Predicted impact top 91% in SE · last 90 daysOriginality Highly original
AI Analysis

This addresses the need for interpretable AI in software development workflows, offering a domain-specific solution for developers and non-technical users to debug coding agent failures more effectively.

The paper tackles the problem of understanding and debugging failures in LLM-based coding agents by developing a systematic XAI approach that transforms raw execution traces into structured explanations, resulting in users identifying root causes 2.8 times faster and proposing fixes with 73% higher accuracy.

Large Language Model (LLM)-based coding agents show promise in automating software development tasks, yet they frequently fail in ways that are difficult for developers to understand and debug. While general-purpose LLMs like GPT can provide ad-hoc explanations of failures, raw execution traces remain challenging to interpret even for experienced developers. We present a systematic explainable AI (XAI) approach that transforms raw agent execution traces into structured, human-interpretable explanations. Our method consists of three key components: (1) a domain-specific failure taxonomy derived from analyzing real agent failures, (2) an automatic annotation system that classifies failures using defined annotation schema, (3) a hybrid explanation generator that produces visual execution flows, natural language explanations, and actionable recommendations. Through a user study with 20 participants (10 technical, 10 non-technical), we demonstrate that our approach enables users to identify failure root causes 2.8 times faster and propose correct fixes with 73% higher accuracy compared to raw execution traces. Importantly, our structured approach outperforms ad-hoc state of the art models explanations by providing consistent, domain-specific insights with integrated visualizations. Our work establishes a framework for systematic agent failure analysis, addressing the critical need for interpretable AI systems in software development workflows

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes