Box2Flow: Instance-based Action Flow Graphs from Videos
This addresses the need for rich flow graphs in procedural video analysis, though it is incremental as it builds on existing task-based methods by focusing on single videos.
The paper tackles the problem of learning accurate and detailed step flow graphs from procedural videos, proposing Box2Flow to extract instance-based graphs from single videos, with experiments on MM-ReS and YouCookII showing effective extraction.
A large amount of procedural videos on the web show how to complete various tasks. These tasks can often be accomplished in different ways and step orderings, with some steps able to be performed simultaneously, while others are constrained to be completed in a specific order. Flow graphs can be used to illustrate the step relationships of a task. Current task-based methods try to learn a single flow graph for all available videos of a specific task. The extracted flow graphs tend to be too abstract, failing to capture detailed step descriptions. In this work, our aim is to learn accurate and rich flow graphs by extracting them from a single video. We propose Box2Flow, an instance-based method to predict a step flow graph from a given procedural video. In detail, we extract bounding boxes from videos, predict pairwise edge probabilities between step pairs, and build the flow graph with a spanning tree algorithm. Experiments on MM-ReS and YouCookII show our method can extract flow graphs effectively.