InconLens: Interactive Visual Diagnosis of Behavioral Inconsistencies in LLM-based Agentic Systems
This addresses a debugging challenge for developers of LLM-based agentic systems, offering a tool for cross-run analysis, though it is incremental as it builds on existing debugging approaches.
The paper tackles the problem of behavioral inconsistencies in LLM-based agentic systems, where identical inputs can lead to varying success or failure across runs, and introduces InconLens, a visual analytics system that helps developers identify divergence points and uncover failure modes more efficiently.
Large Language Model (LLM)-based agentic systems have shown growing promise in tackling complex, multi-step tasks through autonomous planning, reasoning, and interaction with external environments. However, the stochastic nature of LLM generation introduces intrinsic behavioral inconsistency: the same agent may succeed in one execution but fail in another under identical inputs. Diagnosing such inconsistencies remains a major challenge for developers, as agent execution logs are often lengthy, unstructured, and difficult to compare across runs. Existing debugging and evaluation tools primarily focus on inspecting single executions, offering limited support for understanding how and why agent behaviors diverge across repeated runs. To address this challenge, we introduce InconLens, a visual analytics system designed to support interactive diagnosis of LLM-based agentic systems with a particular focus on cross-run behavioral analysis. InconLens introduces information nodes as an intermediate abstraction that captures canonical informational milestones shared across executions, enabling semantic alignment and inspection of agent reasoning trajectories across multiple runs. We demonstrate the effectiveness of InconLens through a detailed case study and further validate its usability and analytical value via expert interviews. Our results show that InconLens enables developers to more efficiently identify divergence points, uncover latent failure modes, and gain actionable insights into improving the reliability and stability of agentic systems.