CVJan 26

SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

arXiv:2601.18305v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses a bottleneck in GUI automation for users relying on agents to complete tasks, though it is incremental as it focuses specifically on swipe interactions.

The paper tackles the problem of GUI agents' poor swipe execution capabilities by proposing SwipeGen, an automated pipeline to synthesize human-like swipe interactions, and GUISwiper, an enhanced GUI agent that achieves 69.07% swipe execution accuracy, a 214% improvement over existing baselines.

With the widespread adoption of Graphical User Interface (GUI) agents for automating GUI interaction tasks, substantial research focused on improving GUI perception to ground task instructions into concrete action steps. However, the step execution capability of these agents has gradually emerged as a new bottleneck for task completion. In particular, existing GUI agents often adopt overly simplified strategies for handling swipe interactions, preventing them from accurately replicating human-like behavior. To address this limitation, we decompose human swipe gestures into multiple quantifiable dimensions and propose an automated pipeline SwipeGen to synthesize human-like swipe interactions through GUI exploration. Based on this pipeline, we construct and release the first benchmark for evaluating the swipe execution capability of GUI agents. Furthermore, leveraging the synthesized data, we propose GUISwiper, a GUI agent with enhanced interaction execution capabilities. Experimental results demonstrate that GUISwiper achieves a swipe execution accuracy of 69.07%, representing a 214% improvement over existing VLM baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes