CVDec 20, 2023

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

arXiv:2312.13108v243 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses the problem of automating complex desktop tasks for users, but it is incremental as it builds on existing LLM-based methods with a new benchmark and framework.

The paper introduces AssistGUI, a benchmark for evaluating GUI automation on Windows desktop applications, and proposes an Actor-Critic Embodied Agent framework that achieves a 46% success rate on tasks from software like After Effects and MS Word.

Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks. We carefully collected a set of 100 tasks from nine widely-used software applications, such as, After Effects and MS Word, each accompanied by the necessary project files for better evaluation. Moreover, we propose an advanced Actor-Critic Embodied Agent framework, which incorporates a sophisticated GUI parser driven by an LLM-agent and an enhanced reasoning mechanism adept at handling lengthy procedural tasks. Our experimental results reveal that our GUI Parser and Reasoning mechanism outshine existing methods in performance. Nevertheless, the potential remains substantial, with the best model attaining only a 46% success rate on our benchmark. We conclude with a thorough analysis of the current methods' limitations, setting the stage for future breakthroughs in this domain.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes