Lasagna: Layered Score Distillation for Disentangled Object Relighting
This addresses the need for intuitive relighting tools for professional artists and photographers, offering a significant improvement over current methods.
The paper tackles the problem of text-guided object relighting in images, which existing methods struggle with by often altering colors, shapes, and textures. It proposes Lasagna, a method that uses score distillation sampling on a diffusion model fine-tuned on synthetic data, achieving state-of-the-art performance with over 91% human preference and preserving image consistency.
Professional artists, photographers, and other visual content creators use object relighting to establish their photo's desired effect. Unfortunately, manual tools that allow relighting have a steep learning curve and are difficult to master. Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit. We propose Lasagna, a method that enables intuitive text-guided relighting control. Lasagna learns a lighting prior by using score distillation sampling to distill the prior of a diffusion model, which has been finetuned on synthetic relighting data. To train Lasagna, we curate a new synthetic dataset ReLiT, which contains 3D object assets re-lit from multiple light source locations. Despite training on synthetic images, quantitative results show that Lasagna relights real-world images while preserving other aspects of the input image, outperforming state-of-the-art text-guided image editing methods. Lasagna enables realistic and controlled results on natural images and digital art pieces and is preferred by humans over other methods in over 91% of cases. Finally, we demonstrate the versatility of our learning objective by extending it to allow colorization, another form of image editing.