PartComposer: Learning and Composing Part-Level Concepts from Single-Image Examples
This enables more flexible and efficient object composition in image generation, though it appears incremental over existing diffusion model methods.
The paper tackles the problem of learning part-level concepts from single-image examples for text-to-image diffusion models, achieving strong disentanglement and controllable composition that outperforms subject and part-level baselines.
We present PartComposer: a framework for part-level concept learning from single-image examples that enables text-to-image diffusion models to compose novel objects from meaningful components. Existing methods either struggle with effectively learning fine-grained concepts or require a large dataset as input. We propose a dynamic data synthesis pipeline generating diverse part compositions to address one-shot data scarcity. Most importantly, we propose to maximize the mutual information between denoised latents and structured concept codes via a concept predictor, enabling direct regulation on concept disentanglement and re-composition supervision. Our method achieves strong disentanglement and controllable composition, outperforming subject and part-level baselines when mixing concepts from the same, or different, object categories.