Relighting Images in the Wild with a Self-Supervised Siamese Auto-Encoder
This work addresses the problem of realistic image relighting for general users by removing the need for supervised data or 3D models.
This paper proposes a self-supervised auto-encoder for relighting single-view images by disentangling illumination and content encodings. The method introduces a spherical harmonic loss and achieves performance similar to supervised methods without supervision or a prior shape model, avoiding common lighting artifacts.
We propose a self-supervised method for image relighting of single view images in the wild. The method is based on an auto-encoder which deconstructs an image into two separate encodings, relating to the scene illumination and content, respectively. In order to disentangle this embedding information without supervision, we exploit the assumption that some augmentation operations do not affect the image content and only affect the direction of the light. A novel loss function, called spherical harmonic loss, is introduced that forces the illumination embedding to convert to a spherical harmonic vector. We train our model on large-scale datasets such as Youtube 8M and CelebA. Our experiments show that our method can correctly estimate scene illumination and realistically re-light input images, without any supervision or a prior shape model. Compared to supervised methods, our approach has similar performance and avoids common lighting artifacts.