CVApr 30, 2024

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

arXiv:2404.19427v130 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses the challenge of personalized image generation for multiple identities, offering a novel solution for applications like digital art or social media, though it builds incrementally on existing concept preservation methods.

The paper tackles the problem of generating images that integrate multiple personal identities (IDs) in a cohesive composition, introducing InstantFamily which achieves state-of-the-art performance in zero-shot multi-ID image generation with precise control and scalability.

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes