EditIDv2: Editable ID Customization with Data-Lubricated ID Feature Integration for Text-to-Image Generation
This addresses the challenge for users needing precise character customization in AI-generated images from detailed narratives, though it is incremental by building on prior work.
The paper tackles the problem of character editing in text-to-image generation for complex narrative scenes with long text inputs, achieving high-quality image generation and identity consistency using minimal data lubrication, as demonstrated by excellent results on the IBench evaluation.
We propose EditIDv2, a tuning-free solution specifically designed for high-complexity narrative scenes and long text inputs. Existing character editing methods perform well under simple prompts, but often suffer from degraded editing capabilities, semantic understanding biases, and identity consistency breakdowns when faced with long text narratives containing multiple semantic layers, temporal logic, and complex contextual relationships. In EditID, we analyzed the impact of the ID integration module on editability. In EditIDv2, we further explore and address the influence of the ID feature integration module. The core of EditIDv2 is to discuss the issue of editability injection under minimal data lubrication. Through a sophisticated decomposition of PerceiverAttention, the introduction of ID loss and joint dynamic training with the diffusion model, as well as an offline fusion strategy for the integration module, we achieve deep, multi-level semantic editing while maintaining identity consistency in complex narrative environments using only a small amount of data lubrication. This meets the demands of long prompts and high-quality image generation, and achieves excellent results in the IBench evaluation.