CVAIApr 21, 2025

Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer

arXiv:2504.14860v18 citationsh-index: 3CVPR
Originality Incremental advance
AI Analysis

This work addresses the lack of temporal annotations in video analysis for applications like surveillance and content indexing, representing an incremental improvement over existing weakly-supervised methods.

The paper tackles the performance gap between weakly-supervised and fully-supervised temporal action localization by proposing PseudoFormer, a two-branch framework that generates high-quality pseudo labels and leverages multiple priors, achieving state-of-the-art results on THUMOS14 and ActivityNet1.3 benchmarks.

Weakly-supervised Temporal Action Localization (WTAL) has achieved notable success but still suffers from a lack of temporal annotations, leading to a performance and framework gap compared with fully-supervised methods. While recent approaches employ pseudo labels for training, three key challenges: generating high-quality pseudo labels, making full use of different priors, and optimizing training methods with noisy labels remain unresolved. Due to these perspectives, we propose PseudoFormer, a novel two-branch framework that bridges the gap between weakly and fully-supervised Temporal Action Localization (TAL). We first introduce RickerFusion, which maps all predicted action proposals to a global shared space to generate pseudo labels with better quality. Subsequently, we leverage both snippet-level and proposal-level labels with different priors from the weak branch to train the regression-based model in the full branch. Finally, the uncertainty mask and iterative refinement mechanism are applied for training with noisy pseudo labels. PseudoFormer achieves state-of-the-art WTAL results on the two commonly used benchmarks, THUMOS14 and ActivityNet1.3. Besides, extensive ablation studies demonstrate the contribution of each component of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes