CVAIMay 10

When Few Steps Are Enough: Training-Free Acceleration of Identity-Preserved Generation

arXiv:2605.094603.3
AI Analysis

For practitioners of personalized image generation, this provides a simple, training-free method to drastically reduce inference cost while maintaining or improving identity fidelity.

The authors show that identity-preserved FLUX generation can be accelerated by replacing the multi-step dev backbone with a distilled schnell backbone without retraining, achieving a 5.9x latency reduction while improving identity similarity by +0.028 ArcFace and image quality by -0.016 LPIPS compared to the 28-step baseline.

Identity-preserved image generation is typically built on many-step diffusion backbones, making personalized generation expensive at deployment time. We show that this cost is often unnecessary for identity-conditioned FLUX generation. A frozen InfuseNet identity adapter trained with dev transfers directly to the distilled schnell backbone without retraining. This two-line replacement -- changing the backbone path and disabling classifier-free guidance -- reduces latency by 5.9x while improving ArcFace identity similarity by +0.028 and lpips by -0.016 over the standard 28-step dev baseline. To explain why this works, we analyze the denoising trajectory and find that identity fidelity enters an early effective regime, often within 4-8 steps, while later steps primarily refine visual detail, sharpness, and contrast. Adapter ablations confirm that identity formation depends on the identity adapter, while attention-stream norm probes suggest that the relative conditioning contribution decreases as sampling proceeds. Preliminary style-adapter and object-adapter sweeps on SDXL and SD1.5 show similar diminishing returns after intermediate steps. These results position distilled backbone replacement as a simple, training-free strategy for improving the efficiency-fidelity tradeoff of identity-preserved generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes