Predicting Penalty Kick Direction Using Multi-Modal Deep Learning with Pose-Guided Attention
This work addresses the problem of anticipating penalty kicks for goalkeepers and analysts, but it is incremental as it builds on existing deep learning methods with a domain-specific application.
The study tackled predicting penalty kick direction by developing a real-time multi-modal deep learning framework that uses RGB frames and pose keypoints, achieving 89% accuracy on a test set and outperforming baselines by 14-22%.
Penalty kicks often decide championships, yet goalkeepers must anticipate the kicker's intent from subtle biomechanical cues within a very short time window. This study introduces a real-time, multi-modal deep learning framework to predict the direction of a penalty kick (left, middle, or right) before ball contact. The model uses a dual-branch architecture: a MobileNetV2-based CNN extracts spatial features from RGB frames, while 2D keypoints are processed by an LSTM network with attention mechanisms. Pose-derived keypoints further guide visual focus toward task-relevant regions. A distance-based thresholding method segments input sequences immediately before ball contact, ensuring consistent input across diverse footage. A custom dataset of 755 penalty kick events was created from real match videos, with frame-level annotations for object detection, shooter keypoints, and final ball placement. The model achieved 89% accuracy on a held-out test set, outperforming visual-only and pose-only baselines by 14-22%. With an inference time of 22 milliseconds, the lightweight and interpretable design makes it suitable for goalkeeper training, tactical analysis, and real-time game analytics.