CVApr 22, 2025

FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation

Zebin Yao, Lei Ren, Huixing Jiang, Chen Wei, Xiaojie Wang, Ruifan Li, Fangxiang Feng

arXiv:2504.15958v310.22 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of generating images with specific subjects from text prompts without fine-tuning, making it practical for real-world deployment, though it is an incremental improvement over prior methods.

The paper tackles the problem of subject-driven image generation, where existing methods trade off fidelity and efficiency, and proposes FreeGraftor, a training-free framework that uses cross-image feature grafting to achieve precise subject identity transfer and text-aligned scene synthesis, significantly outperforming existing zero-shot and training-free approaches.

Subject-driven image generation aims to synthesize novel scenes that faithfully preserve subject identity from reference images while adhering to textual guidance. However, existing methods struggle with a critical trade-off between fidelity and efficiency. Tuning-based approaches rely on time-consuming and resource-intensive, subject-specific optimization, while zero-shot methods often fail to maintain adequate subject consistency. In this work, we propose FreeGraftor, a training-free framework that addresses these limitations through cross-image feature grafting. Specifically, FreeGraftor leverages semantic matching and position-constrained attention fusion to transfer visual details from reference subjects to the generated images. Additionally, our framework introduces a novel noise initialization strategy to preserve the geometry priors of reference subjects, facilitating robust feature matching. Extensive qualitative and quantitative experiments demonstrate that our method enables precise subject identity transfer while maintaining text-aligned scene synthesis. Without requiring model fine-tuning or additional training, FreeGraftor significantly outperforms existing zero-shot and training-free approaches in both subject fidelity and text alignment. Furthermore, our framework can seamlessly extend to multi-subject generation, making it practical for real-world deployment. Our code is available at https://github.com/Nihukat/FreeGraftor.

View on arXiv PDF Code

Similar