HCNov 11, 2016

Help, It Looks Confusing: GUI Task Automation Through Demonstration and Follow-up Questions

Thanapong Intharah, Daniyar Turmukhambetov, Gabriel J. Brostow

arXiv:1611.03906v211.523 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of GUI task automation for non-programming users, though it appears incremental as it builds on existing demonstration-based methods with added user interaction.

The authors tackled the problem of enabling non-programming users to create customized GUI task automation scripts through demonstration and follow-up questions, resulting in a system prototype called HILC that successfully helped users accomplish various tasks, including simple linear, complicated, and multi-executable ones, and was trained faster than the baseline Sikuli Slides.

Non-programming users should be able to create their own customized scripts to perform computer-based tasks for them, just by demonstrating to the machine how it's done. To that end, we develop a system prototype which learns-by-demonstration called HILC (Help, It Looks Confusing). Users train HILC to synthesize a task script by demonstrating the task, which produces the needed screenshots and their corresponding mouse-keyboard signals. After the demonstration, the user answers follow-up questions. We propose a user-in-the-loop framework that learns to generate scripts of actions performed on visible elements of graphical applications. While pure programming-by-demonstration is still unrealistic, we use quantitative and qualitative experiments to show that non-programming users are willing and effective at answering follow-up queries posed by our system. Our models of events and appearance are surprisingly simple, but are combined effectively to cope with varying amounts of supervision. The best available baseline, Sikuli Slides, struggled with the majority of the tests in our user study experiments. The prototype with our proposed approach successfully helped users accomplish simple linear tasks, complicated tasks (monitoring, looping, and mixed), and tasks that span across multiple executables. Even when both systems could ultimately perform a task, ours was trained and refined by the user in less time.

View on arXiv PDF

Similar