CVNov 20, 2025

BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization

Rahul Kumar, Vipul Baghel, Sudhanshu Singh, Bikash Kumar Badatya, Shivam Yadav, Babji Srinivasan, Ravi Hegde

arXiv:2511.16524v16.21 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses the bottleneck in computer vision for combat sports by providing a curated benchmark for researchers in action recognition, automated coaching, and performance assessment in boxing.

The authors tackled the lack of robust datasets for boxing action recognition by creating a comprehensive, well-annotated video dataset with 6,915 punch clips across six types, extracted from 20 YouTube sessions involving 18 athletes, to support research in real-time vision-based analysis.

Accurate analysis of combat sports using computer vision has gained traction in recent years, yet the development of robust datasets remains a major bottleneck due to the dynamic, unstructured nature of actions and variations in recording environments. In this work, we present a comprehensive, well-annotated video dataset tailored for punch detection and classification in boxing. The dataset comprises 6,915 high-quality punch clips categorized into six distinct punch types, extracted from 20 publicly available YouTube sparring sessions and involving 18 different athletes. Each clip is manually segmented and labeled to ensure precise temporal boundaries and class consistency, capturing a wide range of motion styles, camera angles, and athlete physiques. This dataset is specifically curated to support research in real-time vision-based action recognition, especially in low-resource and unconstrained environments. By providing a rich benchmark with diverse punch examples, this contribution aims to accelerate progress in movement analysis, automated coaching, and performance assessment within boxing and related domains.

View on arXiv PDF

Similar