CVMar 15

MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos

arXiv:2603.1425266.5h-index: 21
AI Analysis

This addresses the problem of efficient error monitoring in procedural tasks for applications like training or quality control, representing a novel method for a known bottleneck.

The paper tackles early mistake detection in procedural videos by determining keystep correctness with minimal video observation, achieving superior accuracy while reducing observed video fraction compared to state-of-the-art models.

We introduce the task of early mistake detection in video, where the goal is to determine whether a keystep in a procedural activity is performed correctly while observing as little of the streaming video as possible. To tackle this problem, we propose a method comprising a mistake detector and a reinforcement learning policy. At each timestep, the detector processes recently observed frames to estimate the keystep's correctness while anticipating future visual features, enabling reliable early mistake estimates. Meanwhile, the policy aggregates the detector outputs and visual observations over time and adaptively decides when to exit (i.e., stop processing incoming frames) while producing the final prediction. Using diverse real-world procedural video datasets, we demonstrate that our MistExit model achieves superior mistake detection accuracy while reducing the fraction of video observed compared to state-of-the-art models. Project: https://vision.cs.utexas.edu/projects/mist_exit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes