Qualitative Evaluation of LLM-Designed GUI

Bartosz Sawicki, Tomasz Les, Dariusz Parzych, Aleksandra Wycisk-Ficek, Pawel Trebacz, Pawel Zawadzki

arXiv:2601.22759v13.2h-index: 5

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automated GUI design for developers and designers, but it is incremental as it confirms known limitations of LLMs in this domain.

This study evaluated the usability and adaptability of LLM-generated graphical user interfaces (GUIs) for diverse user needs, finding that while LLMs effectively create structured layouts, they struggle with accessibility standards and interactive functionality, requiring human intervention for usability.

As generative artificial intelligence advances, Large Language Models (LLMs) are being explored for automated graphical user interface (GUI) design. This study investigates the usability and adaptability of LLM-generated interfaces by analysing their ability to meet diverse user needs. The experiments included utilization of three state-of-the-art models from January 2025 (OpenAI GPT o3-mini-high, DeepSeek R1, and Anthropic Claude 3.5 Sonnet) generating mockups for three interface types: a chat system, a technical team panel, and a manager dashboard. Expert evaluations revealed that while LLMs are effective at creating structured layouts, they face challenges in meeting accessibility standards and providing interactive functionality. Further testing showed that LLMs could partially tailor interfaces for different user personas but lacked deeper contextual understanding. The results suggest that while LLMs are promising tools for early-stage UI prototyping, human intervention remains critical to ensure usability, accessibility, and user satisfaction.

View on arXiv PDF

Similar