CVMay 12

Single-Shot HDR Recovery via a Video Diffusion Prior

Chinmay Talegaonkar, Jinshi He, Christopher McKenna, Nicholas Antipa

arXiv:2605.1162814.5

Predicted impact top 64% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For computer vision researchers, this provides a simple, interpretable, and high-fidelity method for single-shot HDR recovery that also generalizes to other image reconstruction tasks.

The authors tackle single-shot HDR reconstruction by re-casting it as conditional video generation, finetuning a video diffusion model to produce an exposure bracket from a single LDR input, then fusing frames with learned per-pixel weights. They outperform state-of-the-art generative baselines on several metrics and are preferred by human evaluators in 72% of pairwise comparisons.

Recent generative methods for single-shot high dynamic range (HDR) image reconstruction show promising results, but often struggle with preserving fidelity to the input image. They require separate models to handle highlights and shadows, or sacrifice interpretability by directly predicting the final HDR image. We address these limitations by re-casting single-shot HDR reconstruction as conditional video generation and fusing the generated frames into an HDR image. We finetune a video diffusion model to generate an exposure bracket, conditioned on a low dynamic range (LDR) input. We fuse this image bracket using per-pixel weights predicted by a light-weight UNet. This formulation is simple, interpretable, and effective. Rather than directly hallucinating an HDR image, it explicitly reconstructs the intermediate exposure stack and fuses it into the final output. Our method eliminates the need for separate models across exposure regimes and produces HDR reconstructions with high input fidelity. On quantitative benchmarks, we outperform state-of-the-art generative baselines with comparable model capacity on several reconstruction metrics. Human evaluators further prefer our results in 72% of pairwise comparisons against existing methods. Finally, we show that this input-conditioned sequence generation and fusion framework extends beyond HDR to other image reconstruction tasks, such as all-in-focus image recovery from a single defocus-blurred input.

View on arXiv PDF

Similar