CVJun 9, 2017

Learning to Learn from Noisy Web Videos

Serena Yeung, Vignesh Ramanathan, Olga Russakovsky, Liyue Shen, Greg Mori, Li Fei-Fei

arXiv:1706.02884v112.932 citations

Originality Incremental advance

AI Analysis

This addresses the scalability issue in action recognition for computer vision researchers by reducing reliance on manual labeling, though it is incremental as it builds on semi-supervised approaches.

The paper tackles the problem of learning visual action classifiers from noisy web videos by proposing a reinforcement learning-based method to automatically select training examples, achieving accurate classifiers on the Sports-1M benchmark and other fine-grained action classes.

Understanding the simultaneously very diverse and intricately fine-grained set of possible human actions is a critical open problem in computer vision. Manually labeling training videos is feasible for some action classes but doesn't scale to the full long-tailed distribution of actions. A promising way to address this is to leverage noisy data from web queries to learn new actions, using semi-supervised or "webly-supervised" approaches. However, these methods typically do not learn domain-specific knowledge, or rely on iterative hand-tuned data labeling policies. In this work, we instead propose a reinforcement learning-based formulation for selecting the right examples for training a classifier from noisy web search results. Our method uses Q-learning to learn a data labeling policy on a small labeled training dataset, and then uses this to automatically label noisy web data for new visual concepts. Experiments on the challenging Sports-1M action recognition benchmark as well as on additional fine-grained and newly emerging action classes demonstrate that our method is able to learn good labeling policies for noisy data and use this to learn accurate visual concept classifiers.

View on arXiv PDF

Similar