Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks
This work addresses the need for more efficient and accurate soccer analytics tools for sports analysts and teams, though it is incremental as it builds on existing methods by adding contextual game state.
The paper tackles the problem of automating spatio-temporal action detection in soccer videos, which is currently a manual and costly task, by integrating visual data with game state information like player positions and team membership, resulting in improved detection metrics.
Soccer analytics rely on two data sources: the player positions on the pitch and the sequences of events they perform. With around 2000 ball events per game, their precise and exhaustive annotation based on a monocular video stream remains a tedious and costly manual task. While state-of-the-art spatio-temporal action detection methods show promise for automating this task, they lack contextual understanding of the game. Assuming professional players' behaviors are interdependent, we hypothesize that incorporating surrounding players' information such as positions, velocity and team membership can enhance purely visual predictions. We propose a spatio-temporal action detection approach that combines visual and game state information via Graph Neural Networks trained end-to-end with state-of-the-art 3D CNNs, demonstrating improved metrics through game state integration.