CVApr 30, 2019

Attentive Spatio-Temporal Representation Learning for Diving Classification

arXiv:1905.00050v130 citations
Originality Incremental advance
AI Analysis

This work addresses the specific problem of automated diving classification for sports analytics, representing an incremental improvement with novel attention mechanisms.

The authors tackled the problem of classifying diving actions from video by proposing an attention-guided LSTM-based neural network, which improved classification accuracy over state-of-the-art models by 11.54% for 2D and 4.24% for 3D frameworks on the Diving48 dataset.

Competitive diving is a well recognized aquatic sport in which a person dives from a platform or a springboard into the water. Based on the acrobatics performed during the dive, diving is classified into a finite set of action classes which are standardized by FINA. In this work, we propose an attention guided LSTM-based neural network architecture for the task of diving classification. The network takes the frames of a diving video as input and determines its class. We evaluate the performance of the proposed model on a recently introduced competitive diving dataset, Diving48. It contains over 18000 video clips which covers 48 classes of diving. The proposed model outperforms the classification accuracy of the state-of-the-art models in both 2D and 3D frameworks by 11.54% and 4.24%, respectively. We show that the network is able to localize the diver in the video frames during the dive without being trained with such a supervision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes