CVAILGMar 26, 2022

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

arXiv:2203.14104v197 citationsh-index: 97Has Code
Originality Highly original
AI Analysis

This addresses the limitation of conventional action recognition models in reasoning about contextual relations between actions for tasks like action segmentation and human activity recognition in long videos.

The paper tackles the problem of understanding sequences of correlated human actions in instructional videos by proposing Bridge-Prompt, a prompt-based framework that models semantics across adjacent actions, achieving state-of-the-art results on benchmarks like GTEA, 50Salads, and Breakfast.

Action recognition models have shown a promising capability to classify human actions in short video clips. In a real scenario, multiple correlated human actions commonly occur in particular orders, forming semantically meaningful human activities. Conventional action recognition approaches focus on analyzing single actions. However, they fail to fully reason about the contextual relations between adjacent actions, which provide potential temporal logic for understanding long videos. In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. More specifically, we reformulate the individual action labels as integrated text prompts for supervision, which bridge the gap between individual action semantics. The generated text prompts are paired with corresponding video clips, and together co-train the text encoder and the video encoder via a contrastive approach. The learned vision encoder has a stronger capability for ordinal-action-related downstream tasks, e.g. action segmentation and human activity recognition. We evaluate the performances of our approach on several video datasets: Georgia Tech Egocentric Activities (GTEA), 50Salads, and the Breakfast dataset. Br-Prompt achieves state-of-the-art on multiple benchmarks. Code is available at https://github.com/ttlmh/Bridge-Prompt

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes