CVApr 21, 2020

TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition

arXiv:2004.10141v227 citations
AI Analysis

This addresses the problem of costly data annotation for video action recognition by enabling learning from few examples, though it is incremental as it builds on existing few-shot learning methods.

The paper tackles few-shot action recognition in videos by proposing TAEN, which represents actions as trajectories in a metric space to capture short-term semantics and long-term connectivity, achieving comparable or state-of-the-art results on Kinetics-400 and ActivityNet benchmarks with minimal training of fully connected layers.

Classification of new class entities requires collecting and annotating hundreds or thousands of samples that is often prohibitively costly. Few-shot learning suggests learning to classify new classes using just a few examples. Only a small number of studies address the challenge of few-shot learning on spatio-temporal patterns such as videos. In this paper, we present the Temporal Aware Embedding Network (TAEN) for few-shot action recognition, that learns to represent actions, in a metric space as a trajectory, conveying both short term semantics and longer term connectivity between action parts. We demonstrate the effectiveness of TAEN on two few shot tasks, video classification and temporal action detection and evaluate our method on the Kinetics-400 and on ActivityNet 1.2 few-shot benchmarks. With training of just a few fully connected layers we reach comparable results to prior art on both few shot video classification and temporal detection tasks, while reaching state-of-the-art in certain scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes