Fine-grained Activity Recognition in Baseball Videos
This work addresses activity recognition for baseball video analysis, but it is incremental as it focuses on dataset creation and model comparison without major breakthroughs.
The paper tackles fine-grained activity recognition in baseball videos by introducing the MLB-YouTube dataset and comparing temporal structure models, finding that learning temporal structure improves recognition.
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.