CVMay 29, 2014

Feature sampling and partitioning for visual vocabulary generation on large action classification datasets

arXiv:1405.7545v120 citations
AI Analysis

This work addresses performance bottlenecks in action classification for video data, though it is incremental as it refines existing methods rather than introducing a new paradigm.

The paper tackled the problem of improving action recognition by optimizing feature sampling and vocabulary construction in bag-of-visual-words pipelines, achieving state-of-the-art results on 5 major datasets with relatively small vocabularies.

The recent trend in action recognition is towards larger datasets, an increasing number of action classes and larger visual vocabularies. State-of-the-art human action classification in challenging video data is currently based on a bag-of-visual-words pipeline in which space-time features are aggregated globally to form a histogram. The strategies chosen to sample features and construct a visual vocabulary are critical to performance, in fact often dominating performance. In this work we provide a critical evaluation of various approaches to building a vocabulary and show that good practises do have a significant impact. By subsampling and partitioning features strategically, we are able to achieve state-of-the-art results on 5 major action recognition datasets using relatively small visual vocabularies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes