CVJul 20, 2022

Spotting Temporally Precise, Fine-Grained Events in Video

arXiv:2207.10213v157 citationsh-index: 35
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting exact moments of events in video for applications like sports analysis, though it is incremental as it builds on existing video action detection and segmentation methods.

The paper tackles the problem of spotting temporally precise, fine-grained events in video by introducing a new task and proposing E2E-Spot, a compact end-to-end model that significantly outperforms adapted baselines from prior video understanding tasks.

We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur). Precise spotting requires models to reason globally about the full-time scale of actions and locally to identify subtle frame-to-frame appearance and motion differences that identify events during these actions. Surprisingly, we find that top performing solutions to prior video understanding tasks such as action detection and segmentation do not simultaneously meet both requirements. In response, we propose E2E-Spot, a compact, end-to-end model that performs well on the precise spotting task and can be trained quickly on a single GPU. We demonstrate that E2E-Spot significantly outperforms recent baselines adapted from the video action detection, segmentation, and spotting literature to the precise spotting task. Finally, we contribute new annotations and splits to several fine-grained sports action datasets to make these datasets suitable for future work on precise spotting.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes