AISep 22, 2025

Orcust: Stepwise-Feedback Reinforcement Learning for GUI Agent

arXiv:2509.17917v12 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the challenge of enhancing reasoning reliability and data efficiency for GUI agents in interactive tasks, representing a strong specific gain rather than a broad breakthrough.

The paper tackles the problem of unreliable reward signals and limited online trajectory generation in GUI agents by introducing Orcust, a framework that integrates Principle-Constrained Reward Modeling and Online VM-Grounded Trajectory Construction, achieving state-of-the-art performance with improvements of 22.2% on ScreenSpot and 23.9% on ScreenSpot-Pro over the base model.

Recent advances in GUI agents have achieved remarkable grounding and action-prediction performance, yet existing models struggle with unreliable reward signals and limited online trajectory generation. In this paper, we introduce Orcust, a framework that integrates Principle-Constrained Reward Modeling (PCRM) and Online VM-Grounded Trajectory Construction (OVTC) to enhance reasoning reliability and data efficiency in interactive GUI tasks. We leverages environment-verifiable and LLM-derived principle to enforce interpretable reward signals that constrain long chain-of-thought reasoning and rule-based feedback. OVTC spins up instrumented virtual machines to autonomously collect structured GUI interaction trajectories with explicit procedural and structural objectives, enabling the training of a stepwise reward model that robustly captures human preferences and adheres to task-specific constraints. Extensive experiments on standard GUI benchmarks covering perceptual grounding, foundational operations, and end-to-end task execution reveal that Orcust achieves state-of-the-art performance, improving by 22.2\% on ScreenSpot and 23.9\% on ScreenSpot-Pro over the base model (i.e. Qwen2.5-VL-7B). The results demonstrate Orcust's effectiveness in enhancing the reasoning, adaptability and scalability of GUI agents across various environments and task complexities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes