Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
This addresses the inverse rendering problem for indoor scene understanding, which is incremental over recent generative model-based methods.
The paper tackles the ill-posed problem of inverse rendering from a single RGB image by proposing a diffusion-based framework that decomposes it into geometry, material, and lighting. Their channel-wise noise scheduling approach enables a single architecture to achieve both accurate single solutions and diverse multiple solutions, with experimental results showing superiority in both diversity and accuracy.
We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, finding a single accurate solution and generating diverse solutions can be conflicting. In this paper, we propose a channel-wise noise scheduling approach that allows a single diffusion model architecture to achieve two conflicting objectives. The resulting two diffusion models, trained with different channel-wise noise schedules, can predict a single highly accurate solution and present multiple possible solutions. The experimental results demonstrate the superiority of our two models in terms of both diversity and accuracy, which translates to enhanced performance in downstream applications such as object insertion and material editing.