CVAIHCSep 29, 2025

UI-UG: A Unified MLLM for UI Understanding and Generation

arXiv:2509.24361v22 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses domain-specific problems in user interface design and automation for developers and designers, with incremental improvements through integration of tasks.

The paper tackles the challenge of improving accuracy in UI understanding and quality in UI generation using a unified multimodal large language model (UI-UG), achieving state-of-the-art performance on understanding tasks and competitive generation performance at reduced computational cost.

Although Multimodal Large Language Models (MLLMs) have been widely applied across domains, they are still facing challenges in domain-specific tasks, such as User Interface (UI) understanding accuracy and UI generation quality. In this paper, we introduce UI-UG (a unified MLLM for UI Understanding and Generation), integrating both capabilities. For understanding tasks, we employ Supervised Fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO) to enhance fine-grained understanding on the modern complex UI data. For generation tasks, we further use Direct Preference Optimization (DPO) to make our model generate human-preferred UIs. In addition, we propose an industrially effective workflow, including the design of an LLM-friendly domain-specific language (DSL), training strategies, rendering processes, and evaluation metrics. In experiments, our model achieves state-of-the-art (SOTA) performance on understanding tasks, outperforming both larger general-purpose MLLMs and similarly-sized UI-specialized models. Our model is also on par with these larger MLLMs in UI generation performance at a fraction of the computational cost. We also demonstrate that integrating understanding and generation tasks can improve accuracy and quality for both tasks. Code and Model: https://github.com/neovateai/UI-UG

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes