CVDMJul 5, 2023

Task-Specific Alignment and Multiple Level Transformer for Few-Shot Action Recognition

arXiv:2307.01985v212 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work improves few-shot action recognition for video analysis, though it is incremental as it builds on existing Transformer-based methods.

The paper tackles the problem of few-shot action recognition by addressing irrelevant video frames and insufficient feature mining, achieving state-of-the-art results on HMDB51 and UCF101 datasets and competitive performance on Kinetics and Something-Something V2 benchmarks.

In the research field of few-shot learning, the main difference between image-based and video-based is the additional temporal dimension. In recent years, some works have used the Transformer to deal with frames, then get the attention feature and the enhanced prototype, and the results are competitive. However, some video frames may relate little to the action, and only using single frame-level or segment-level features may not mine enough information. We address these problems sequentially through an end-to-end method named "Task-Specific Alignment and Multiple-level Transformer Network (TSA-MLT)". The first module (TSA) aims at filtering the action-irrelevant frames for action duration alignment. Affine Transformation for frame sequence in the time dimension is used for linear sampling. The second module (MLT) focuses on the Multiple-level feature of the support prototype and query sample to mine more information for the alignment, which operates on different level features. We adopt a fusion loss according to a fusion distance that fuses the L2 sequence distance, which focuses on temporal order alignment, and the Optimal Transport distance, which focuses on measuring the gap between the appearance and semantics of the videos. Extensive experiments show our method achieves state-of-the-art results on the HMDB51 and UCF101 datasets and a competitive result on the benchmark of Kinetics and something 2-something V2 datasets. Our code is available at the URL: https://github.com/cofly2014/tsa-mlt.git

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes