CVJul 30, 2025

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

arXiv:2507.22827v214 citationsh-index: 11Has Code
Originality Highly original
AI Analysis

This addresses the challenge of automating UI-to-code transformation for software developers, offering a scalable solution with improved robustness over existing methods.

The paper tackles the problem of generating front-end code from UI designs by proposing ScreenCoder, a modular multi-agent framework that decomposes the task into grounding, planning, and generation stages, achieving state-of-the-art performance in layout accuracy, structural coherence, and code correctness.

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can translate images to code, they often fail on complex UIs, struggling to unify visual perception, layout planning, and code synthesis within a single monolithic model, which leads to frequent perception and planning errors. To address this, we propose ScreenCoder, a modular multi-agent framework that decomposes the task into three interpretable stages: grounding, planning, and generation. By assigning these distinct responsibilities to specialized agents, our framework achieves significantly higher robustness and fidelity than end-to-end approaches. Furthermore, ScreenCoder serves as a scalable data engine, enabling us to generate high-quality image-code pairs. We use this data to fine-tune open-source MLLM via a dual-stage pipeline of supervised fine-tuning and reinforcement learning, demonstrating substantial gains in its UI generation capabilities. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in layout accuracy, structural coherence, and code correctness. Our code is made publicly available at https://github.com/leigest519/ScreenCoder.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes