Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach
This addresses GUI automation for users needing robust, adaptable automation of repetitive tasks, representing a novel method for a known bottleneck.
The paper tackles the problem of brittle generalization, high latency, and limited long-horizon coherence in GUI task automation by proposing Log2Plan, an adaptive framework that combines structured two-level planning with task mining from user behavior logs. It demonstrates significant improvements, maintaining over 60.0% success rate on long-horizon tasks.
GUI task automation streamlines repetitive tasks, but existing LLM or VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence. Their reliance on single-shot reasoning or static plans makes them fragile under UI changes or complex tasks. Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs, enabling robust and adaptable GUI automation. Log2Plan constructs high-level plans by mapping user commands to a structured task dictionary, enabling consistent and generalizable automation. To support personalization and reuse, it employs a task mining approach from user behavior logs that identifies user-specific patterns. These high-level plans are then grounded into low-level action sequences by interpreting real-time GUI context, ensuring robust execution across varying interfaces. We evaluated Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time. Notably, it maintains over 60.0% success rate even on long-horizon task sequences, highlighting its robustness in complex, multi-step workflows.