Scalable domain adaptation of convolutional neural networks
This work addresses the need for scalable domain adaptation in computer vision for domains like tourism, where manual labeling is costly, though it is incremental as it builds on standard CNN architectures.
The paper tackled the problem of adapting convolutional neural networks to new visual domains without manual labeling by using noisy Flickr images and reranking techniques, resulting in improved performance over generic models and non-CNN baselines on datasets like Oxford5k, INRIA Holidays, and Div150Cred.
Convolutional neural networks (CNNs) tend to become a standard approach to solve a wide array of computer vision problems. Besides important theoretical and practical advances in their design, their success is built on the existence of manually labeled visual resources, such as ImageNet. The creation of such datasets is cumbersome and here we focus on alternatives to manual labeling. We hypothesize that new resources are of uttermost importance in domains which are not or weakly covered by ImageNet, such as tourism photographs. We first collect noisy Flickr images for tourist points of interest and apply automatic or weakly-supervised reranking techniques to reduce noise. Then, we learn domain adapted models with a standard CNN architecture and compare them to a generic model obtained from ImageNet. Experimental validation is conducted with publicly available datasets, including Oxford5k, INRIA Holidays and Div150Cred. Results show that low-cost domain adaptation improves results compared to the use of generic models but also compared to strong non-CNN baselines such as triangulation embedding.