HC SEApr 15

AppAgent-Claw: CLI Is All You Need for GUI Automation

Zhixue Song, Zhiheng Zhang, Yi Song, Chi Zhang

arXiv:2606.0517181.9

Predicted impact top 3% in HC · last 90 daysOriginality Synthesis-oriented

AI Analysis

For developers seeking to automate GUI-heavy tasks without stable APIs, this provides a practical, efficient alternative to slow and costly LLM-based agents.

AppAgent-Claw converts GUI workflows into reusable CLI skills via a record-once, replay-many paradigm, eliminating runtime LLM inference. It achieves robust execution with layered localization and validation-coupled execution, enabling efficient integration of GUI tasks into the OpenClaw ecosystem.

The OpenClaw platform provides a practical foundation for automation through its skill-oriented architecture, organizing external capabilities into lightweight, reusable components that can be invoked efficiently through a command-line interface (CLI). However, a significant bottleneck remains: many real-world tasks are confined to graphical user interfaces (GUIs) with no stable API available. While LLM-based GUI agents offer generality, their reliance on repeated live model inference makes them too slow, costly, and inconsistent to serve as efficient OpenClaw skills. In this paper, we present AppAgent-Claw, a demonstration-driven system that converts GUI workflows into reliable, reusable skills without runtime inference. By following a ``record-once, replay-many'' paradigm, the system captures rich contextual metadata to facilitate robust execution. It employs a layered localization strategy to handle visual shifts and a validation-coupled execution model to ensure intended on-screen effects. AppAgent-Claw provides a practical, efficient, and diagnosable solution for integrating GUI-bound tasks into the OpenClaw ecosystem.

View on arXiv PDF

Similar