Webly Supervised Learning of Convolutional Networks
This work addresses the challenge of leveraging large-scale, noisy web data for computer vision tasks, offering a robust method for webly supervised learning that reduces reliance on curated datasets.
The paper tackles the problem of training convolutional neural networks (CNNs) using noisy web data by proposing a two-step curriculum learning approach, resulting in a model that outperforms a fine-tuned ImageNet CNN on Pascal VOC 2012 and achieves state-of-the-art performance on VOC 2007 without using VOC training data.
We present an approach to utilize large amounts of web data for learning CNNs. Specifically inspired by curriculum learning, we present a two-step approach for CNN training. First, we use easy images to train an initial visual representation. We then use this initial CNN and adapt it to harder, more realistic images by leveraging the structure of data and categories. We demonstrate that our two-stage CNN outperforms a fine-tuned CNN trained on ImageNet on Pascal VOC 2012. We also demonstrate the strength of webly supervised learning by localizing objects in web images and training a R-CNN style detector. It achieves the best performance on VOC 2007 where no VOC training data is used. Finally, we show our approach is quite robust to noise and performs comparably even when we use image search results from March 2013 (pre-CNN image search era).