CVMar 4, 2021

Modeling Multi-Label Action Dependencies for Temporal Action Localization

arXiv:2103.03027v366 citations
AI Analysis

This work addresses the challenge of accurately localizing actions in untrimmed videos for applications like video analysis, though it is incremental as it builds on existing attention-based methods by refining dependency modeling.

The paper tackles the problem of modeling complex action relationships in videos for temporal action localization by distinguishing co-occurrence and temporal dependencies, resulting in improved performance over state-of-the-art methods on benchmarks like MultiTHUMOS and Charades in terms of f-mAP and a proposed metric.

Real-world videos contain many complex actions with inherent relationships between action classes. In this work, we propose an attention-based architecture that models these action relationships for the task of temporal action localization in untrimmed videos. As opposed to previous works that leverage video-level co-occurrence of actions, we distinguish the relationships between actions that occur at the same time-step and actions that occur at different time-steps (i.e. those which precede or follow each other). We define these distinct relationships as action dependencies. We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer. The MLAD layer consists of two branches: a Co-occurrence Dependency Branch and a Temporal Dependency Branch to model co-occurrence action dependencies and temporal action dependencies, respectively. We observe that existing metrics used for multi-label classification do not explicitly measure how well action dependencies are modeled, therefore, we propose novel metrics that consider both co-occurrence and temporal dependencies between action classes. Through empirical evaluation and extensive analysis, we show improved performance over state-of-the-art methods on multi-label action localization benchmarks(MultiTHUMOS and Charades) in terms of f-mAP and our proposed metric.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes