CVNov 25, 2025

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

arXiv:2511.19990v15 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses a key limitation in diffusion models for image refinement, benefiting applications in image editing and restoration, though it appears incremental as it builds on existing diffusion and reinforcement learning techniques.

The paper tackled the problem of preserving fine-grained visual details in reference-guided image generation by introducing OmniRefiner, a framework that uses two-stage reference-driven correction and reinforcement learning to enhance pixel-level consistency. The result was significant improvements in reference alignment and detail preservation, outperforming both open-source and commercial models on challenging benchmarks.

Reference-guided image generation has progressed rapidly, yet current diffusion models still struggle to preserve fine-grained visual details when refining a generated image using a reference. This limitation arises because VAE-based latent compression inherently discards subtle texture information, causing identity- and attribute-specific cues to vanish. Moreover, post-editing approaches that amplify local details based on existing methods often produce results inconsistent with the original image in terms of lighting, texture, or shape. To address this, we introduce \ourMthd{}, a detail-aware refinement framework that performs two consecutive stages of reference-driven correction to enhance pixel-level consistency. We first adapt a single-image diffusion editor by fine-tuning it to jointly ingest the draft image and the reference image, enabling globally coherent refinement while maintaining structural fidelity. We then apply reinforcement learning to further strengthen localized editing capability, explicitly optimizing for detail accuracy and semantic consistency. Extensive experiments demonstrate that \ourMthd{} significantly improves reference alignment and fine-grained detail preservation, producing faithful and visually coherent edits that surpass both open-source and commercial models on challenging reference-guided restoration benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes