CVSep 1, 2025

MILO: A Lightweight Perceptual Quality Metric for Image and Latent-Space Optimization

arXiv:2509.01411v11 citationsh-index: 48ACM Trans Graph
Originality Incremental advance
AI Analysis

This work addresses the need for accurate and fast image quality metrics for real-time applications and generative pipelines, though it is incremental as it builds on existing distortion and masking techniques.

The paper tackles the problem of full-reference image quality assessment by introducing MILO, a lightweight perceptual metric trained with pseudo-MOS supervision, which outperforms existing metrics on benchmarks and enables efficient perceptual optimization in tasks like denoising and super-resolution, reducing computational overhead.

We present MILO (Metric for Image- and Latent-space Optimization), a lightweight, multiscale, perceptual metric for full-reference image quality assessment (FR-IQA). MILO is trained using pseudo-MOS (Mean Opinion Score) supervision, in which reproducible distortions are applied to diverse images and scored via an ensemble of recent quality metrics that account for visual masking effects. This approach enables accurate learning without requiring large-scale human-labeled datasets. Despite its compact architecture, MILO outperforms existing metrics across standard FR-IQA benchmarks and offers fast inference suitable for real-time applications. Beyond quality prediction, we demonstrate the utility of MILO as a perceptual loss in both image and latent domains. In particular, we show that spatial masking modeled by MILO, when applied to latent representations from a VAE encoder within Stable Diffusion, enables efficient and perceptually aligned optimization. By combining spatial masking with a curriculum learning strategy, we first process perceptually less relevant regions before progressively shifting the optimization to more visually distorted areas. This strategy leads to significantly improved performance in tasks like denoising, super-resolution, and face restoration, while also reducing computational overhead. MILO thus functions as both a state-of-the-art image quality metric and as a practical tool for perceptual optimization in generative pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes