CVAICLApr 16

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

arXiv:2604.1530996.8h-index: 19
Predicted impact top 6% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For UI/UX designers and automated webpage generation, this work addresses style inconsistency and poor global coherence in AIGC-based element generation.

MM-WebAgent introduces a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through planning and self-reflection, achieving better coherence and visual consistency than code-generation and agent-based baselines.

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes