CVJul 10, 2021

TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition

arXiv:2107.04782v498 citationsHas Code
Originality Incremental advance
AI Analysis

This work improves few-shot action recognition for video analysis, though it is incremental as it builds on existing metric learning approaches.

The paper tackles the misalignment problem in few-shot action recognition by proposing a two-stage network that addresses action duration and evolution misalignments, achieving state-of-the-art performance on benchmark datasets.

Few-shot action recognition aims to recognize novel action classes (query) using just a few samples (support). The majority of current approaches follow the metric learning paradigm, which learns to compare the similarity between videos. Recently, it has been observed that directly measuring this similarity is not ideal since different action instances may show distinctive temporal distribution, resulting in severe misalignment issues across query and support videos. In this paper, we arrest this problem from two distinct aspects -- action duration misalignment and action evolution misalignment. We address them sequentially through a Two-stage Action Alignment Network (TA2N). The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e.g. background). Next, the second stage coordinates query feature to match the spatial-temporal action evolution of support by performing temporally rearrange and spatially offset prediction. Extensive experiments on benchmark datasets show the potential of the proposed method in achieving state-of-the-art performance for few-shot action recognition.The code of this project can be found at https://github.com/R00Kie-Liu/TA2N

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes