CVCLLGOct 24, 2024

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

AppleGeorgia Tech
arXiv:2410.18967v252 citationsh-index: 20ICLR
Originality Incremental advance
AI Analysis

This work addresses the problem of universal UI understanding for developers and users across platforms like iPhone, Android, and Web, representing an incremental advancement over prior models.

The paper tackles the challenge of building a generalist model for user interface understanding across diverse platforms by introducing Ferret-UI 2, a multimodal large language model that significantly outperforms its predecessor and demonstrates strong cross-platform transfer capabilities on multiple benchmarks.

Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. Building on the foundation of Ferret-UI, Ferret-UI 2 introduces three key innovations: support for multiple platform types, high-resolution perception through adaptive scaling, and advanced task training data generation powered by GPT-4o with set-of-mark visual prompting. These advancements enable Ferret-UI 2 to perform complex, user-centered interactions, making it highly versatile and adaptable for the expanding diversity of platform ecosystems. Extensive empirical experiments on referring, grounding, user-centric advanced tasks (comprising 9 subtasks $\times$ 5 platforms), GUIDE next-action prediction dataset, and GUI-World multi-platform benchmark demonstrate that Ferret-UI 2 significantly outperforms Ferret-UI, and also shows strong cross-platform transfer capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes