SMART: SMPLest-X Mesh Adaptation and RAFT Tracking for Soccer Pose Estimation
This work provides a significant improvement in 3D pose estimation for soccer players, which is valuable for sports analytics and broadcasting.
This paper addresses the problem of estimating 3D world-space poses of soccer players from broadcast video. Their method, SMART, achieved a score of 0.647 on the validation set, representing a 38.6% improvement over the FIFA baseline of 1.053, and scored 0.593 on the held-out test set.
We present our approach to the FIFA Skeletal Tracking Challenge 2026, which requires estimating 3D world-space poses of soccer players from broadcast video. Our method finetunes SMPLest-X (ViT-H, 687 M parameters) via a stratified clip split, multi-task depth supervision, and broadcast augmentation, paired with a RAFT dense optical flow camera tracker, foot-plane anchoring, and two-pass temporal smoothing. Against the FIFA baseline score of 1.053 on the validation set, SMART achieves 0.647, a 38.6% improvement; on the held-out test set, SMART scores 0.593 (Global MPJPE: 0.324 m, Local MPJPE: 0.054 m).