CVMar 31, 2025

Consistent Subject Generation via Contrastive Instantiated Concepts

Lee Hsin-Ying, Kelvin C. K. Chan, Ming-Hsuan Yang

arXiv:2503.24387v11 citationsh-index: 4Trans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This addresses the limitation of subject variation in long content generation for users of text-to-image models, though it appears incremental as it builds on existing approaches.

The paper tackles the problem of generating consistent subjects across multiple independent creations in text-to-image models, introducing Contrastive Concept Instantiation (CoCoIns) which achieves comparable performance to existing methods while offering higher flexibility.

While text-to-image generative models can synthesize diverse and faithful contents, subject variation across multiple creations limits the application in long content generation. Existing approaches require time-consuming tuning, references for all subjects, or access to other creations. We introduce Contrastive Concept Instantiation (CoCoIns) to effectively synthesize consistent subjects across multiple independent creations. The framework consists of a generative model and a mapping network, which transforms input latent codes into pseudo-words associated with certain instances of concepts. Users can generate consistent subjects with the same latent codes. To construct such associations, we propose a contrastive learning approach that trains the network to differentiate the combination of prompts and latent codes. Extensive evaluations of human faces with a single subject show that CoCoIns performs comparably to existing methods while maintaining higher flexibility. We also demonstrate the potential of extending CoCoIns to multiple subjects and other object categories.

View on arXiv PDF

Similar