CLDec 12, 2024

Falcon-UI: Understanding GUI Before Following User Instructions

arXiv:2412.09362v122 citationsh-index: 23Has Code
Originality Incremental advance
AI Analysis

This addresses the need for more human-like GUI interaction agents, though it is incremental by building on existing datasets and methods.

The paper tackles the problem of GUI agents lacking GUI context understanding by introducing an instruction-free dataset, Insight-UI Dataset, and a model, Falcon-UI, which achieves accuracy comparable to a much larger model on the AITZ benchmark.

Pursuing human-like interaction for Graphical User Interface (GUI) agents requires understanding the GUI context and following user instructions. However, existing works typically couple these two aspects and focus more on instruct-following abilities, while ignoring the importance of understanding the GUI context. In this paper, we introduce an instruction-free GUI navigation dataset, termed Insight-UI Dataset, to enhance model comprehension of GUI environments. Insight-UI Dataset is automatically generated from the Common Crawl corpus, simulating various platforms -- including iOS, Android, Windows, and Linux -- across multiple resolutions on 312K domains. Although GUI interactions vary by context, diverse interfaces share common internal patterns, such as clicking an item to view its details. It implies the feasibility of independent GUI operation learning, followed by joint optimization with instruction tuning. Thereby, we develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI Dataset and subsequently fine-tuned on Android and Web GUI datasets, including AITW, AITZ, Android Control, and Mind2Web. With 7 billion parameters, Falcon-UI achieves accuracy comparable to the 72 billion-parameter Qwen2VL on AITZ, validating the alignment between GUI context comprehension and agent performance. Our code and dataset will be open-sourced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes