CVMar 26, 2016

How useful is photo-realistic rendering for visual learning?

arXiv:1603.08152v2159 citations
AI Analysis

This work addresses the challenge of expensive and time-consuming dataset labeling for computer vision tasks, offering a semi-automated approach that is incremental in leveraging existing rendering technology.

The paper tackled the problem of creating high-quality labeled datasets for visual learning by using photo-realistic rendering to generate synthetic data, focusing on object viewpoint estimation for cars, and showed that combining synthetic images with a small amount of real data improves accuracy.

Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive. With the advent of rich 3D repositories, photo-realistic rendering systems offer the opportunity to provide nearly limitless data. Yet, their primary value for visual learning may be the quality of the data they can provide rather than the quantity. Rendering engines offer the promise of perfect labels in addition to the data: what the precise camera pose is; what the precise lighting location, temperature, and distribution is; what the geometry of the object is. In this work we focus on semi-automating dataset creation through use of synthetic data and apply this method to an important task -- object viewpoint estimation. Using state-of-the-art rendering software we generate a large labeled dataset of cars rendered densely in viewpoint space. We investigate the effect of rendering parameters on estimation performance and show realism is important. We show that generalizing from synthetic data is not harder than the domain adaptation required between two real-image datasets and that combining synthetic images with a small amount of real data improves estimation accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes