AILGFeb 15

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

arXiv:2602.14093v13 citations
Originality Highly original
AI Analysis

This addresses the challenge of high latency and unverifiable rewards in GUI agent training, enabling more efficient and reproducible post-training for AI systems interacting with graphical user interfaces.

The paper tackles the problem of training GUI agents on real-world applications by introducing GUI-GENESIS, a framework that automatically synthesizes efficient GUI environments with verifiable rewards, resulting in a 10x latency reduction, over $28,000 cost savings per epoch, and agents outperforming baselines by up to 14.54%.

Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes