CVJun 5, 2024

Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter

arXiv:2406.02881v22 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient and high-fidelity personalization in text-to-image generation, representing an incremental improvement over existing methods.

The paper tackles the problem of ID customization generation in text-to-image models by proposing Inv-Adapter, which achieves high fidelity and efficiency without an additional image encoder, resulting in competitive performance in ID fidelity, generation loyalty, speed, and reduced model parameters.

The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high fidelity and high-efficiency requirements. Their main bottleneck lies in the prompt image encoder, which produces weak alignment signals with the text-to-image model and significantly increased model size. Towards this end, we propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion, without additional image encoder. Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model by carefully designing a lightweight attention adapter. We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes