CVNov 19, 2019

Cross-Class Relevance Learning for Temporal Concept Localization

arXiv:1911.08548v17 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of modeling complex class relationships in video understanding tasks, representing an incremental improvement over existing localization architectures.

The paper tackles the problem of temporal concept localization by proposing a Cross-Class Relevance Learning approach that models complex class relationships through pairwise binary relevance prediction, achieving first place out of over 280 teams in the 3rd YouTube-8M Video Understanding Challenge.

We present a novel Cross-Class Relevance Learning approach for the task of temporal concept localization. Most localization architectures rely on feature extraction layers followed by a classification layer which outputs class probabilities for each segment. However, in many real-world applications classes can exhibit complex relationships that are difficult to model with this architecture. In contrast, we propose to incorporate target class and class-related features as input, and learn a pairwise binary model to predict general segment to class relevance. This facilitates learning of shared information between classes, and allows for arbitrary class-specific feature engineering. We apply this approach to the 3rd YouTube-8M Video Understanding Challenge together with other leading models, and achieve first place out of over 280 teams. In this paper we describe our approach and show some empirical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes