CVJul 17, 2023

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

Kumar Ashutosh, Santhosh Kumar Ramakrishnan, Triantafyllos Afouras, Kristen Grauman

arXiv:2307.08763v221.344 citationsh-index: 99

Originality Highly original

AI Analysis

This addresses the problem of procedural activity understanding for applications like recipe or DIY tasks, offering a novel approach that is not incremental but introduces a new method for a known bottleneck.

The paper tackles keystep recognition in instructional videos by automatically discovering a task graph from how-to videos to represent probabilistic keystep sequences, which improves zero-shot keystep localization and video representation learning, exceeding state-of-the-art results on multiple datasets.

Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state -- such as the steps of a recipe or a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a predefined sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, and then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional videos, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.

View on arXiv PDF

Similar