HCSep 30, 2025

The Invisible Mentor: Inferring User Actions from Screen Recordings to Recommend Better Workflows

arXiv:2509.265571 citationsh-index: 65
Originality Incremental advance
AI Analysis

For users of feature-rich tools who struggle to discover efficient workflows, this system provides automated, vision-based guidance without requiring explicit goal descriptions.

InvisibleMentor uses screen recordings to detect inefficient workflows in tools like Excel and recommends more efficient alternatives, outperforming a prompt-based assistant in actionability and helpfulness.

Many users struggle to notice when a more efficient workflow exists in feature-rich tools like Excel. Existing AI assistants offer help only after users describe their goals or problems, which can be effortful and imprecise. We present InvisibleMentor, a system that turns screen recordings of task completion into vision-grounded reflections on tasks. It detects issues such as repetitive edits and recommends more efficient alternatives based on observed behavior. Unlike prior systems that rely on logs, APIs, or user prompts, InvisibleMentor operates directly on screen recordings. It uses a two-stage pipeline: a vision-language model reconstructs actions and context, and a language model generates structured, high-fidelity suggestions. In evaluation, InvisibleMentor accurately identified inefficient workflows, and participants found its suggestions more actionable, tailored, and more helpful for learning and improvement compared to a prompt-based spreadsheet assistant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes