TTNet: Real-time temporal and spatial video analysis of table tennis
This work addresses the problem of automating referee decisions and data collection in sports, specifically for table tennis, though it is incremental as it builds on existing multi-task deep learning approaches.
The authors tackled real-time analysis of table tennis videos by developing TTNet, a neural network that achieved 97.0% accuracy in event spotting and 2 pixels RMSE in ball detection with 97.5% accuracy on their dataset.
We present a neural network TTNet aimed at real-time processing of high-resolution table tennis videos, providing both temporal (events spotting) and spatial (ball detection and semantic segmentation) data. This approach gives core information for reasoning score updates by an auto-referee system. We also publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events, semantic segmentation masks, and ball coordinates for evaluation of multi-task approaches, primarily oriented on spotting of quick events and small objects tracking. TTNet demonstrated 97.0% accuracy in game events spotting along with 2 pixels RMSE in ball detection with 97.5% accuracy on the test part of the presented dataset. The proposed network allows the processing of downscaled full HD videos with inference time below 6 ms per input tensor on a machine with a single consumer-grade GPU. Thus, we are contributing to the development of real-time multi-task deep learning applications and presenting approach, which is potentially capable of substituting manual data collection by sports scouts, providing support for referees' decision-making, and gathering extra information about the game process.