CVApr 1, 2025

FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection

arXiv:2504.00647v11 citationsh-index: 1ICIC
Originality Incremental advance
AI Analysis

This work addresses boundary refinement in temporal action detection for video analysis, offering incremental improvements over existing methods.

The paper tackles the problem of noise and redundancy in pre-trained video features for temporal action detection by proposing a frequency-aware decoupling network, resulting in state-of-the-art performance on benchmarks like THUMOS14, HACS, and ActivityNet-1.3.

Temporal action detection aims to locate and classify actions in untrimmed videos. While recent works focus on designing powerful feature processors for pre-trained representations, they often overlook the inherent noise and redundancy within these features. Large-scale pre-trained video encoders tend to introduce background clutter and irrelevant semantics, leading to context confusion and imprecise boundaries. To address this, we propose a frequency-aware decoupling network that improves action discriminability by filtering out noisy semantics captured by pre-trained models. Specifically, we introduce an adaptive temporal decoupling scheme that suppresses irrelevant information while preserving fine-grained atomic action details, yielding more task-specific representations. In addition, we enhance inter-frame modeling by capturing temporal variations to better distinguish actions from background redundancy. Furthermore, we present a long-short-term category-aware relation network that jointly models local transitions and long-range dependencies, improving localization precision. The refined atomic features and frequency-guided dynamics are fed into a standard detection head to produce accurate action predictions. Extensive experiments on THUMOS14, HACS, and ActivityNet-1.3 show that our method, powered by InternVideo2-6B features, achieves state-of-the-art performance on temporal action detection benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes