CharCom: Composable Identity Control for Multi-Character Story Illustration
This addresses the challenge of character identity consistency for users in applications like story illustration and animation, though it appears incremental as it builds on existing diffusion models with modular adapters.
The paper tackles the problem of maintaining character identity consistency in diffusion-based text-to-image generation for multi-character story illustrations, and the result is CharCom, a framework that significantly enhances character fidelity, semantic alignment, and temporal coherence with minimal overhead.
Ensuring character identity consistency across varying prompts remains a fundamental limitation in diffusion-based text-to-image generation. We propose CharCom, a modular and parameter-efficient framework that achieves character-consistent story illustration through composable LoRA adapters, enabling efficient per-character customization without retraining the base model. Built on a frozen diffusion backbone, CharCom dynamically composes adapters at inference using prompt-aware control. Experiments on multi-scene narratives demonstrate that CharCom significantly enhances character fidelity, semantic alignment, and temporal coherence. It remains robust in crowded scenes and enables scalable multi-character generation with minimal overhead, making it well-suited for real-world applications such as story illustration and animation.