CVJun 5, 2019

Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion

arXiv:1906.02042v223 citations
Originality Incremental advance
AI Analysis

This work addresses player tracking for sports analytics in challenging single-feed basketball videos, presenting an incremental improvement by showing that deep learning features alone can suffice without additional contextual cues.

The paper tackled the problem of tracking basketball players in single-camera videos with cluttering and occlusions by developing a tracker that fuses pose and semantic features, achieving performance measured in MOTA on a dataset with over 10k instances.

Tracking sports players is a widely challenging scenario, specially in single-feed videos recorded in tight courts, where cluttering and occlusions cannot be avoided. This paper presents an analysis of several geometric and semantic visual features to detect and track basketball players. An ablation study is carried out and then used to remark that a robust tracker can be built with Deep Learning features, without the need of extracting contextual ones, such as proximity or color similarity, nor applying camera stabilization techniques. The presented tracker consists of: (1) a detection step, which uses a pretrained deep learning model to estimate the players pose, followed by (2) a tracking step, which leverages pose and semantic information from the output of a convolutional layer in a VGG network. Its performance is analyzed in terms of MOTA over a basketball dataset with more than 10k instances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes