CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
This addresses the challenge of scarce ground truth data in computer vision for researchers and practitioners, though it is incremental as it builds on existing CNN-based methods.
The authors tackled intrinsic image decomposition by creating a large-scale synthetic dataset using physically-based rendering and a new training method, achieving state-of-the-art performance on real-world benchmarks IIW and SAW, with improvements from 0.0 to 0.5 in error metrics.
Intrinsic image decomposition is a challenging, long-standing computer vision problem for which ground truth data is very difficult to acquire. We explore the use of synthetic data for training CNN-based intrinsic image decomposition models, then applying these learned models to real-world images. To that end, we present \ICG, a new, large-scale dataset of physically-based rendered images of scenes with full ground truth decompositions. The rendering process we use is carefully designed to yield high-quality, realistic images, which we find to be crucial for this problem domain. We also propose a new end-to-end training method that learns better decompositions by leveraging \ICG, and optionally IIW and SAW, two recent datasets of sparse annotations on real-world images. Surprisingly, we find that a decomposition network trained solely on our synthetic data outperforms the state-of-the-art on both IIW and SAW, and performance improves even further when IIW and SAW data is added during training. Our work demonstrates the suprising effectiveness of carefully-rendered synthetic data for the intrinsic images task.