CVMay 30

VICR: Visual In-Context Restoration for Real-World Image Super-Resolution

arXiv:2606.0070428.8h-index: 3
Predicted impact top 81% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses structural drift and semantic inconsistency in generative super-resolution for real-world images, offering a more robust conditioning mechanism.

VICR proposes a Diffusion Transformer framework for real-world image super-resolution that treats the task as image completion with decoupled visual priors, achieving state-of-the-art performance on multiple benchmarks with only 127M parameters.

Real-world image super-resolution (Real-ISR) requires balancing structural fidelity to degraded observations with realistic detail synthesis. However, existing generative Real-ISR methods often rely on entangled conditioning mechanisms, leading to structural drift or semantically inconsistent details. To address this issue, we propose Visual In-Context Restoration (VICR), a Diffusion Transformer (DiT)-based framework that formulates Real-ISR as image completion. Specifically, we introduce a decoupled visual prior injection mechanism that derives local and global cues from the low-quality (LQ) image: local cues help recover image structures and support high-frequency detail synthesis, while global cues guide overall generation and promote semantic consistency. For ambiguous regions under severe degradation, VICR employs an inference-time agent to refine semantic prompts using visual evidence from the LQ input while keeping model parameters fixed. Experiments show that VICR achieves state-of-the-art performance across multiple Real-ISR benchmarks with only 127M trainable parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes