CVGRMay 1, 2024

RGB$\leftrightarrow$X: Image decomposition and synthesis using material- and lighting-aware diffusion models

arXiv:2405.00666v1166 citationsh-index: 20SIGGRAPH
Originality Incremental advance
AI Analysis

This work addresses the challenge of bridging forward rendering, inverse rendering, and generative synthesis for graphics and vision applications, offering a flexible approach for scene understanding and image generation.

The paper tackles the problem of decomposing images into material and lighting properties (RGB→X) and synthesizing realistic images from those properties (X→RGB) for interior scenes, introducing diffusion models that achieve improved estimation and highly realistic synthesis.

The three areas of realistic forward rendering, per-pixel inverse rendering, and generative image synthesis may seem like separate and unrelated sub-fields of graphics and vision. However, recent work has demonstrated improved estimation of per-pixel intrinsic channels (albedo, roughness, metallicity) based on a diffusion architecture; we call this the RGB$\rightarrow$X problem. We further show that the reverse problem of synthesizing realistic images given intrinsic channels, X$\rightarrow$RGB, can also be addressed in a diffusion framework. Focusing on the image domain of interior scenes, we introduce an improved diffusion model for RGB$\rightarrow$X, which also estimates lighting, as well as the first diffusion X$\rightarrow$RGB model capable of synthesizing realistic images from (full or partial) intrinsic channels. Our X$\rightarrow$RGB model explores a middle ground between traditional rendering and generative models: we can specify only certain appearance properties that should be followed, and give freedom to the model to hallucinate a plausible version of the rest. This flexibility makes it possible to use a mix of heterogeneous training datasets, which differ in the available channels. We use multiple existing datasets and extend them with our own synthetic and real data, resulting in a model capable of extracting scene properties better than previous work and of generating highly realistic images of interior scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes