CVApr 8, 2025

A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model

arXiv:2504.06144v15 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses style consistency and speed issues for users of text-to-image models, though it is incremental as it builds on existing autoregressive and diffusion-based methods.

The paper tackles style misalignment and slow inference in text-to-image generation by proposing a training-free method with three components, achieving comparable quality, significantly improved style alignment, and inference speeds over six times faster than the fastest model.

We present a training-free style-aligned image generation method that leverages a scale-wise autoregressive model. While large-scale text-to-image (T2I) models, particularly diffusion-based methods, have demonstrated impressive generation quality, they often suffer from style misalignment across generated image sets and slow inference speeds, limiting their practical usability. To address these issues, we propose three key components: initial feature replacement to ensure consistent background appearance, pivotal feature interpolation to align object placement, and dynamic style injection, which reinforces style consistency using a schedule function. Unlike previous methods requiring fine-tuning or additional training, our approach maintains fast inference while preserving individual content details. Extensive experiments show that our method achieves generation quality comparable to competing approaches, significantly improves style alignment, and delivers inference speeds over six times faster than the fastest model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes