CVMar 20

Improving Image-to-Image Translation via a Rectified Flow Reformulation

Satoshi Iizuka, Shun Okamoto, Kazuhiro Fukui

arXiv:2603.201869.4h-index: 2

AI Analysis

This work addresses a practical problem for researchers and practitioners in computer vision by offering a lightweight plug-in method to enhance conventional image-to-image translation models without heavy generative pipelines.

The paper tackles the problem of over-smoothing and multimodal target issues in image-to-image translation by proposing I2I-RFR, a reformulation that recasts regression networks as continuous-time transport models, resulting in improved performance across tasks with gains in perceptual quality and detail preservation.

In this work, we propose Image-to-Image Rectified Flow Reformulation (I2I-RFR), a practical plug-in reformulation that recasts standard I2I regression networks as continuous-time transport models. While pixel-wise I2I regression is simple, stable, and easy to adapt across tasks, it often over-smooths ill-posed and multimodal targets, whereas generative alternatives often require additional components, task-specific tuning, and more complex training and inference pipelines. Our method augments the backbone input by channel-wise concatenation with a noise-corrupted version of the ground-truth target and optimizes a simple t-reweighted pixel loss. This objective admits a rectified-flow interpretation via an induced velocity field, enabling ODE-based progressive refinement at inference time while largely preserving the standard supervised training pipeline. In most cases, adopting I2I-RFR requires only expanding the input channels, and inference can be performed with a few explicit solver steps (e.g., 3 steps) without distillation. Extensive experiments across multiple image-to-image translation and video restoration tasks show that I2I-RFR generally improves performance across a wide range of tasks and backbones, with particularly clear gains in perceptual quality and detail preservation. Overall, I2I-RFR provides a lightweight way to incorporate continuous-time refinement into conventional I2I models without requiring a heavy generative pipeline.

View on arXiv PDF

Similar