CVDec 10, 2024

Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence

arXiv:2412.07481v510 citationsh-index: 26AAAI
Originality Incremental advance
AI Analysis

This work addresses computational efficiency and intra-class variance issues in few-shot action recognition for video analysis, representing an incremental improvement over existing methods.

The paper tackles few-shot action recognition for long video sub-sequences by proposing Manta, a framework that enhances Mamba with local feature modeling and hybrid contrastive learning, achieving new state-of-the-art performance on benchmarks like SSv2, Kinetics, UCF101, and HMDB51.

In few-shot action recognition (FSAR), long sub-sequences of video naturally express entire actions more effectively. However, the high computational complexity of mainstream Transformer-based methods limits their application. Recent Mamba demonstrates efficiency in modeling long sequences, but directly applying Mamba to FSAR overlooks the importance of local feature modeling and alignment. Moreover, long sub-sequences within the same class accumulate intra-class variance, which adversely impacts FSAR performance. To solve these challenges, we propose a Matryoshka MAmba and CoNtrasTive LeArning framework (Manta). Firstly, the Matryoshka Mamba introduces multiple Inner Modules to enhance local feature representation, rather than directly modeling global features. An Outer Module captures dependencies of timeline between these local features for implicit temporal alignment. Secondly, a hybrid contrastive learning paradigm, combining both supervised and unsupervised methods, is designed to mitigate the negative effects of intra-class variance accumulation. The Matryoshka Mamba and the hybrid contrastive learning paradigm operate in two parallel branches within Manta, enhancing Mamba for FSAR of long sub-sequence. Manta achieves new state-of-the-art performance on prominent benchmarks, including SSv2, Kinetics, UCF101, and HMDB51. Extensive empirical studies prove that Manta significantly improves FSAR of long sub-sequence from multiple perspectives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes