HDR Environment Map Estimation with Latent Diffusion Models
This work addresses the problem of generating high-quality environment maps for lighting reflective surfaces in computer vision, representing an incremental improvement over existing methods.
The paper tackles HDR environment map estimation from a single-view image by using a Latent Diffusion Model with ERP convolutional padding and a panoramically-adapted Diffusion Transformer to reduce distortions and seams, achieving competitive performance with state-of-the-art methods in image quality and lighting accuracy.
We advance the field of HDR environment map estimation from a single-view image by establishing a novel approach leveraging the Latent Diffusion Model (LDM) to produce high-quality environment maps that can plausibly light mirror-reflective surfaces. A common issue when using the ERP representation, the format used by the vast majority of approaches, is distortions at the poles and a seam at the sides of the environment map. We remove the border seam artefact by proposing an ERP convolutional padding in the latent autoencoder. Additionally, we investigate whether adapting the diffusion network architecture to the ERP format can improve the quality and accuracy of the estimated environment map by proposing a panoramically-adapted Diffusion Transformer architecture. Our proposed PanoDiT network reduces ERP distortions and artefacts, but at the cost of image quality and plausibility. We evaluate with standard benchmarks to demonstrate that our models estimate high-quality environment maps that perform competitively with state-of-the-art approaches in both image quality and lighting accuracy.