CVJan 19, 2025

Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal Action Localization from the Perspective of Noise Correction

Quan Zhang, Yuxin Qi, Xi Tang, Rui Yuan, Xi Lin, Ke Zhang, Chun Yuan

arXiv:2501.11124v217.412 citationsh-index: 4AAAI

Originality Incremental advance

AI Analysis

This work improves weakly-supervised temporal action localization for video analysis by reducing pseudo-label noise, though it is incremental as it builds on existing pseudo-label methods.

The paper tackles noise in pseudo-labels for weakly-supervised temporal action localization, which causes performance issues like inaccurate boundaries and missed short actions, and introduces a two-stage noisy label learning strategy with denoising and teacher-student modules to address these problems, achieving state-of-the-art detection accuracy and inference speed on THUMOS14 and ActivityNet v1.2 benchmarks.

Pseudo-label learning methods have been widely applied in weakly-supervised temporal action localization. Existing works directly utilize weakly-supervised base model to generate instance-level pseudo-labels for training the fully-supervised detection head. We argue that the noise in pseudo-labels would interfere with the learning of fully-supervised detection head, leading to significant performance leakage. Issues with noisy labels include:(1) inaccurate boundary localization; (2) undetected short action clips; (3) multiple adjacent segments incorrectly detected as one segment. To target these issues, we introduce a two-stage noisy label learning strategy to harness every potential useful signal in noisy labels. First, we propose a frame-level pseudo-label generation model with a context-aware denoising algorithm to refine the boundaries. Second, we introduce an online-revised teacher-student framework with a missing instance compensation module and an ambiguous instance correction module to solve the short-action-missing and many-to-one problems. Besides, we apply a high-quality pseudo-label mining loss in our online-revised teacher-student framework to add different weights to the noisy labels to train more effectively. Our model outperforms the previous state-of-the-art method in detection accuracy and inference speed greatly upon the THUMOS14 and ActivityNet v1.2 benchmarks.

View on arXiv PDF

Similar