CVAug 19, 2022

Hierarchical Compositional Representations for Few-shot Action Recognition

arXiv:2208.09424v330 citationsh-index: 96
Originality Incremental advance
AI Analysis

This addresses the problem of data scarcity in action recognition for applications like intelligent surveillance and human-computer interaction, though it is incremental as it builds on existing few-shot learning approaches.

The paper tackles few-shot action recognition by proposing hierarchical compositional representations (HCR) that decompose actions into sub-actions and fine-grained spatially attentional sub-actions, achieving state-of-the-art results on HMDB51, UCF101, and Kinetics datasets.

Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human-computer interaction. However, few-shot action recognition has not been well explored and remains challenging because of data scarcity. In this paper, we propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition. Specifically, we divide a complicated action into several sub-actions by carefully designed hierarchical clustering and further decompose the sub-actions into more fine-grained spatially attentional sub-actions (SAS-actions). Although there exist large differences between base classes and novel classes, they can share similar patterns in sub-actions or SAS-actions. Furthermore, we adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations. It computes the optimal matching flows between sub-actions as distance metric, which is favorable for comparing fine-grained patterns. Extensive experiments show our method achieves the state-of-the-art results on HMDB51, UCF101 and Kinetics datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes