CVSep 24, 2022

Global Semantic Descriptors for Zero-Shot Action Recognition

arXiv:2209.12061v14 citationsh-index: 33Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of recognizing actions in videos without labeled training data for zero-shot learning, offering an incremental improvement by combining global classifiers based on objects and sentences.

The paper tackles zero-shot action recognition by introducing a method that leverages relationships between actions, objects, and descriptive sentences to estimate object-action affinities and action class probabilities without hard human labeling, achieving state-of-the-art results on Kinetics-400 and competitive performance on UCF-101.

The success of Zero-shot Action Recognition (ZSAR) methods is intrinsically related to the nature of semantic side information used to transfer knowledge, although this aspect has not been primarily investigated in the literature. This work introduces a new ZSAR method based on the relationships of actions-objects and actions-descriptive sentences. We demonstrate that representing all object classes using descriptive sentences generates an accurate object-action affinity estimation when a paraphrase estimation method is used as an embedder. We also show how to estimate probabilities over the set of action classes based only on a set of sentences without hard human labeling. In our method, the probabilities from these two global classifiers (i.e., which use features computed over the entire video) are combined, producing an efficient transfer knowledge model for action classification. Our results are state-of-the-art in the Kinetics-400 dataset and are competitive on UCF-101 under the ZSAR evaluation. Our code is available at https://github.com/valterlej/objsentzsar

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes