CVMar 11, 2024

When No-Reference Image Quality Models Meet MAP Estimation in Diffusion Latents

arXiv:2403.06406v21 citationsh-index: 73
Originality Incremental advance
AI Analysis

This work provides a novel perceptual optimization method for image enhancement, offering complementary insights into NR-IQA models, though it is incremental in applying existing models to a new framework.

The authors tackled the problem of using no-reference image quality assessment (NR-IQA) models as priors for image enhancement by integrating them into a maximum a posteriori (MAP) estimation framework in diffusion latent space, resulting in noticeably better enhancement of real-world images with unknown distortions while preserving fidelity.

Contemporary no-reference image quality assessment (NR-IQA) models can effectively quantify perceived image quality, often achieving strong correlations with human perceptual scores on standard IQA benchmarks. Yet, limited efforts have been devoted to treating NR-IQA models as natural image priors for real-world image enhancement, and consequently comparing them from a perceptual optimization standpoint. In this work, we show -- for the first time -- that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement. This is achieved by performing gradient ascent in the diffusion latent space rather than in the raw pixel domain, leveraging a pretrained differentiable and bijective diffusion process. Likely, different NR-IQA models lead to different enhanced outputs, which in turn provides a new computational means of comparing them. Unlike conventional correlation-based measures, our comparison method offers complementary insights into the respective strengths and weaknesses of the competing NR-IQA models in perceptual optimization scenarios. Additionally, we aim to improve the best-performing NR-IQA model in diffusion latent MAP estimation by incorporating the advantages of other top-performing methods. The resulting model delivers noticeably better results in enhancing real-world images afflicted by unknown and complex distortions, all preserving a high degree of image fidelity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes