LGAIFeb 11

UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

arXiv:2604.023451 citationsh-index: 1
AI Analysis

This work addresses the challenge of scalable GUI automation for developers and users, offering a novel approach that is incremental in improving existing methods.

The paper tackles the problem of scaling generalist GUI agents by addressing data scalability bottlenecks and proposes UI-Oceanus, a framework that uses forward dynamics prediction from synthetic environmental feedback, resulting in average success rate improvements of 7% on offline benchmarks and 16.8% in real-world online navigation.

Scaling generalist GUI agents is hindered by the data scalability bottleneck of expensive human demonstrations and the "distillation ceiling" of synthetic teacher supervision. To transcend these limitations, we propose UI-Oceanus, a framework that shifts the learning focus from mimicking high-level trajectories to mastering interaction physics via ground-truth environmental feedback. Through a systematic investigation of self-supervised objectives, we identify that forward dynamics, defined as the generative prediction of future interface states, acts as the primary driver for scalability and significantly outweighs inverse inference. UI-Oceanus leverages this insight by converting low-cost autonomous exploration, which is verified directly by system execution, into high-density generative supervision to construct a robust internal world model. Experimental evaluations across a series of models demonstrate the decisive superiority of our approach: models utilizing Continual Pre-Training (CPT) on synthetic dynamics outperform non-CPT baselines with an average success rate improvement of 7% on offline benchmarks, which amplifies to a 16.8% gain in real-world online navigation. Furthermore, we observe that navigation performance scales with synthetic data volume. These results confirm that grounding agents in forward predictive modeling offers a superior pathway to scalable GUI automation with robust cross-domain adaptability and compositional generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes