CV AIMar 6

Reflective Flow Sampling Enhancement

Zikai Zhou, Muyao Wang, Shitong Shao, Lichen Bai, Haoyi Xiong, Bo Han, Zeke Xie

arXiv:2603.06165v111.1h-index: 9

Predicted impact top 38% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This addresses a gap in enhancing text-to-image generation for flow models, offering a novel inference method with potential test-time scaling, though it is incremental as it builds on existing flow matching techniques.

The paper tackles the problem of inference-time enhancement for text-to-image flow models, which often fail with existing methods, by proposing Reflective Flow Sampling (RF-Sampling), a training-free framework that improves generation quality and prompt alignment, as demonstrated in extensive experiments across multiple benchmarks.

The growing demand for text-to-image generation has led to rapid advances in generative modeling. Recently, text-to-image diffusion models trained with flow matching algorithms, such as FLUX, have achieved remarkable progress and emerged as strong alternatives to conventional diffusion models. At the same time, inference-time enhancement strategies have been shown to improve the generation quality and text-prompt alignment of text-to-image diffusion models. However, these techniques are mainly applicable to conventional diffusion models and usually fail to perform well on flow models. To bridge this gap, we propose Reflective Flow Sampling (RF-Sampling), a theoretically-grounded and training-free inference enhancement framework explicitly designed for flow models, especially for the CFG-distilled variants (i.e., models distilled from CFG guidance techniques), like FLUX. Departing from heuristic interpretations, we provide a formal derivation proving that RF-Sampling implicitly performs gradient ascent on the text-image alignment score. By leveraging a linear combination of textual representations and integrating them with flow inversion, RF-Sampling allows the model to explore noise spaces that are more consistent with the input prompt. Extensive experiments across multiple benchmarks demonstrate that RF-Sampling consistently improves both generation quality and prompt alignment. Moreover, RF-Sampling is also the first inference enhancement method that can exhibit test-time scaling ability to some extent on FLUX.

View on arXiv PDF

Similar