Empty Cities: Image Inpainting for a Dynamic-Object-Invariant Space
This work addresses the problem of creating static images from dynamic scenes for applications like augmented reality or robot localization, representing an incremental improvement in image inpainting.
The paper tackles the problem of converting images with dynamic objects like vehicles or pedestrians into realistic static frames by detecting dynamic objects and inpainting occluded backgrounds, achieving results validated through qualitative and quantitative comparisons against state-of-the-art inpainting methods.
In this paper we present an end-to-end deep learning framework to turn images that show dynamic content, such as vehicles or pedestrians, into realistic static frames. This objective encounters two main challenges: detecting all the dynamic objects, and inpainting the static occluded background with plausible imagery. The second problem is approached with a conditional generative adversarial model that, taking as input the original dynamic image and its dynamic/static binary mask, is capable of generating the final static image. The former challenge is addressed by the use of a convolutional network that learns a multi-class semantic segmentation of the image. These generated images can be used for applications such as augmented reality or vision-based robot localization purposes. To validate our approach, we show both qualitative and quantitative comparisons against other state-of-the-art inpainting methods by removing the dynamic objects and hallucinating the static structure behind them. Furthermore, to demonstrate the potential of our results, we carry out pilot experiments that show the benefits of our proposal for visual place recognition.