Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views
This work solves the viewpoint estimation problem in computer vision, which is incremental as it builds on existing methods by leveraging 3D models for data synthesis.
The paper tackles the problem of object viewpoint estimation from 2D images by addressing data scarcity and feature limitations, using a framework that combines rendered 3D models with CNNs, resulting in significant outperformance over state-of-the-art methods on the PASCAL 3D+ benchmark.
Object viewpoint estimation from 2D images is an essential task in computer vision. However, two issues hinder its progress: scarcity of training data with viewpoint annotations, and a lack of powerful features. Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining render-based image synthesis and CNNs. We believe that 3D models have the potential in generating a large number of images of high variation, which can be well exploited by deep CNN with a high learning capacity. Towards this goal, we propose a scalable and overfit-resistant image synthesis pipeline, together with a novel CNN specifically tailored for the viewpoint estimation task. Experimentally, we show that the viewpoint estimation from our pipeline can significantly outperform state-of-the-art methods on PASCAL 3D+ benchmark.