CVNov 20, 2025

BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization

arXiv:2511.16524v11 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses the bottleneck in computer vision for combat sports by providing a curated benchmark for researchers in action recognition, automated coaching, and performance assessment in boxing.

The authors tackled the lack of robust datasets for boxing action recognition by creating a comprehensive, well-annotated video dataset with 6,915 punch clips across six types, extracted from 20 YouTube sessions involving 18 athletes, to support research in real-time vision-based analysis.

Accurate analysis of combat sports using computer vision has gained traction in recent years, yet the development of robust datasets remains a major bottleneck due to the dynamic, unstructured nature of actions and variations in recording environments. In this work, we present a comprehensive, well-annotated video dataset tailored for punch detection and classification in boxing. The dataset comprises 6,915 high-quality punch clips categorized into six distinct punch types, extracted from 20 publicly available YouTube sparring sessions and involving 18 different athletes. Each clip is manually segmented and labeled to ensure precise temporal boundaries and class consistency, capturing a wide range of motion styles, camera angles, and athlete physiques. This dataset is specifically curated to support research in real-time vision-based action recognition, especially in low-resource and unconstrained environments. By providing a rich benchmark with diverse punch examples, this contribution aims to accelerate progress in movement analysis, automated coaching, and performance assessment within boxing and related domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes