Neural Rerendering in the Wild
This addresses scene modeling and rerendering for applications like virtual tourism, but it is incremental as it builds on existing 3D reconstruction and neural rendering methods.
The paper tackles the problem of total scene capture from internet photos by reconstructing a 3D point cloud and training a neural network to rerender images under varying appearance and viewpoints, achieving realistic manipulation in videos and comparisons with prior work.
We explore total scene capture -- recording, modeling, and rerendering a scene under varying appearance such as season and time of day. Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud. For each photo, we render the scene points into a deep framebuffer, and train a neural network to learn the mapping of these initial renderings to the actual photos. This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians. The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions. We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling. We also compare results with prior work on scene reconstruction from internet photos.