HCApr 14

JARVIS: A Just-in-Time AR Visual Instruction System for Cross-Reality Task Guidance

arXiv:2604.1010862.5h-index: 13
Predicted impact top 9% in HC · last 90 daysOriginality Incremental advance
AI Analysis

For users performing hybrid physical-virtual tasks, JARVIS provides automated AR guidance, addressing a gap in existing AI-powered tutorial systems.

JARVIS is a VLM-driven AR system that generates step-by-step cross-reality task guidance from a single prompt, with real-time state verification and adaptive feedback. In a within-subjects study (N=14), it improved usability, workload, success rate, and visualization effectiveness over baselines.

Many everyday tasks rely on external tutorials such as manuals and videos, requiring users to constantly switch between reading instructions and performing actions, which disrupts workflow and increases cognitive load. Augmented reality (AR) enables in-situ guidance, while recent advances in large language models (LLMs) and vision-language models (VLMs) make it possible to automatically generate such guidance. However, existing AI-powered AR tutorial systems primarily focus on physical procedural tasks and provide limited support for hybrid physical and virtual workspaces. To address this gap, we conduct a formative study of cross-reality tasks and identify key requirements for state awareness and cross-reality coordination. We present JARVIS, a VLM-driven AR instruction system that generates contextual, step-by-step guidance from a single prompt, with real-time state verification and adaptive visual feedback. To inform the system design, we conducted a formative study to understand guidance needs across cross-reality tasks, which we categorize into four types, real-to-real (R2R), real-to-virtual (R2V), virtual-to-real (V2R), and virtual-to-virtual (V2V). A within-subjects study (N=14) across four domains shows JARVIS improves usability, workload, success rate, and visualization effectiveness over baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes