CVJul 10, 2025

Multi-Scale Attention and Gated Shifting for Fine-Grained Event Spotting in Videos

arXiv:2507.07381v21 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses fine-grained action recognition in sports videos, particularly for table tennis, by introducing a novel module and dataset, representing an incremental improvement over existing methods.

The paper tackled the problem of precise event spotting in sports videos by proposing a Multi-Scale Attention Gate Shift Module (MSAGSM) that enhances temporal modeling and spatial adaptability, achieving new state-of-the-art results across four benchmarks with minimal overhead.

Precise Event Spotting (PES) in sports videos requires frame-level recognition of fine-grained actions from single-camera footage. Existing PES models typically incorporate lightweight temporal modules such as the Gate Shift Module (GSM) or the Gate Shift Fuse to enrich 2D CNN feature extractors with temporal context. However, these modules are limited in both temporal receptive field and spatial adaptability. We propose a Multi-Scale Attention Gate Shift Module (MSAGSM) that enhances GSM with multi-scale temporal shifts and channel grouped spatial attention, enabling efficient modeling of both short and long-term dependencies while focusing on salient regions. MSAGSM is a lightweight, plug-and-play module that integrates seamlessly with diverse 2D backbones. To further advance the field, we introduce the Table Tennis Australia dataset, the first PES benchmark for table tennis containing over 4,800 precisely annotated events. Extensive experiments across four PES benchmarks demonstrate that MSAGSM consistently improves performance with minimal overhead, setting new state-of-the-art results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes