SELGApr 11, 2025

SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow

IBM
arXiv:2504.08696v22 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This addresses a tooling gap for researchers working on software engineering agents, enabling faster analysis of agent errors and improvements, though it is incremental as it builds on existing agent frameworks.

The paper tackles the problem of analyzing and visualizing the complex trajectories of auto-regressive LLM-based software engineering agents, which are difficult to decipher due to long sequences and prolonged interactions, by proposing SeaView, a tool that helps researchers visualize and inspect experiments, reducing diagnosis time from 10-30 minutes for experienced researchers and 30-60 minutes for less experienced ones.

Auto-regressive LLM-based software engineering (SWE) agents, henceforth SWE agents, have made tremendous progress (>60% on SWE-Bench Verified) on real-world coding challenges including GitHub issue resolution. SWE agents use a combination of reasoning, environment interaction and self-reflection to resolve issues thereby generating "trajectories". Analysis of SWE agent trajectories is difficult, not only as they exceed LLM sequence length (sometimes, greater than 128k) but also because it involves a relatively prolonged interaction between an LLM and the environment managed by the agent. In case of an agent error, it can be hard to decipher, locate and understand its scope. Similarly, it can be hard to track improvements or regression over multiple runs or experiments. While a lot of research has gone into making these SWE agents reach state-of-the-art, much less focus has been put into creating tools to help analyze and visualize agent output. We propose a novel tool called SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow, with a vision to assist SWE-agent researchers to visualize and inspect their experiments. SeaView's novel mechanisms help compare experimental runs with varying hyper-parameters or LLMs, and quickly get an understanding of LLM or environment related problems. Based on our user study, experienced researchers spend between 10 and 30 minutes to gather the information provided by SeaView, while researchers with little experience can spend between 30 minutes to 1 hour to diagnose their experiment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes