SEAIHCMar 14

Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

arXiv:2604.1975021.1h-index: 2
Predicted impact top 22% in SE · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a domain-specific bottleneck for developers working on GUI applications, offering an incremental improvement over existing methods.

The paper tackles the problem of LLM-based agents struggling with GUI code generation and debugging due to reliance on text feedback, by introducing a vision-feedback-based multi-agent system that increases success rates from 21.68% to 28.29% and visual scores from 0.4284 to 0.5584 on a new benchmark.

Recent advances in Large Language Model (LLM)-based agents have shown remarkable progress in code generation. However, current agent methods mainly rely on text-output-based feedback (e.g. command-line outputs) for multi-round debugging and struggle in graphical user interface (GUI) that involve visual information. This is mainly due to two limitations: 1) GUI programs are event-driven, yet existing methods cannot simulate user interactions to trigger GUI element logic 2) GUI programs possess visual attributes, making it difficult for text-based approaches to assess whether the rendered interface meets user needs. To systematically address these challenges, we first introduce InteractGUI Bench, a novel benchmark comprising 984 commonly used real-world desktop GUI application tasks designed for fine-grained evaluation of both interaction logic and visual structure. Furthermore, we propose VF-Coder, a vision-feedback-based multi-agent system for debugging GUI code. By perceiving visual information and directly interacting with program interfaces, VF-Coder can identify potential logic and layout issues in a human-like manner. On InteractGUI Bench, our VF-Coder approach increases the success rate of Gemini-3-Flash from 21.68% to 28.29% and raises the visual score from 0.4284 to 0.5584, indicating the effectiveness of visual feedback in GUI debugging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes