CV GRFeb 4

X2HDR: HDR Image Generation in a Perceptually Uniform Space

Ronghuan Wu, Wanchao Su, Kede Ma, Jing Liao, Rafał K. Mantiuk

arXiv:2602.04814v16.94 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the lack of HDR image generation capabilities in current AI models, enabling applications in photography and display technology, though it is incremental as it adapts existing methods rather than introducing a new paradigm.

The paper tackled the problem of generating high-dynamic-range (HDR) images from existing low-dynamic-range (LDR) pretrained diffusion models without retraining from scratch, by adapting them in a perceptually uniform space, resulting in improved perceptual fidelity, text-image alignment, and effective dynamic range compared to previous methods.

High-dynamic-range (HDR) formats and displays are becoming increasingly prevalent, yet state-of-the-art image generators (e.g., Stable Diffusion and FLUX) typically remain limited to low-dynamic-range (LDR) output due to the lack of large-scale HDR training data. In this work, we show that existing pretrained diffusion models can be easily adapted to HDR generation without retraining from scratch. A key challenge is that HDR images are natively represented in linear RGB, whose intensity and color statistics differ substantially from those of sRGB-encoded LDR images. This gap, however, can be effectively bridged by converting HDR inputs into perceptually uniform encodings (e.g., using PU21 or PQ). Empirically, we find that LDR-pretrained variational autoencoders (VAEs) reconstruct PU21-encoded HDR inputs with fidelity comparable to LDR data, whereas linear RGB inputs cause severe degradations. Motivated by this finding, we describe an efficient adaptation strategy that freezes the VAE and finetunes only the denoiser via low-rank adaptation in a perceptually uniform space. This results in a unified computational method that supports both text-to-HDR synthesis and single-image RAW-to-HDR reconstruction. Experiments demonstrate that our perceptually encoded adaptation consistently improves perceptual fidelity, text-image alignment, and effective dynamic range, relative to previous techniques.

View on arXiv PDF

Similar