CVMay 20, 2022

Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors

João V. B. Soares, Avijit Shah, Topojoy Biswas

arXiv:2205.10450v216.340 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of accurately localizing actions in sports videos, which is incremental as it builds on existing methods with specific enhancements.

The paper tackles the problem of temporally precise action spotting in soccer videos by using dense detection anchors with fine-grained temporal displacement predictions, achieving a new state-of-the-art on the SoccerNet-v2 dataset with marked improvements in localization.

We present a model for temporally precise action spotting in videos, which uses a dense set of detection anchors, predicting a detection confidence and corresponding fine-grained temporal displacement for each anchor. We experiment with two trunk architectures, both of which are able to incorporate large temporal contexts while preserving the smaller-scale features required for precise localization: a one-dimensional version of a u-net, and a Transformer encoder (TE). We also suggest best practices for training models of this kind, by applying Sharpness-Aware Minimization (SAM) and mixup data augmentation. We achieve a new state-of-the-art on SoccerNet-v2, the largest soccer video dataset of its kind, with marked improvements in temporal localization. Additionally, our ablations show: the importance of predicting the temporal displacements; the trade-offs between the u-net and TE trunks; and the benefits of training with SAM and mixup.

View on arXiv PDF Code

Similar