Context Diffusion: In-Context Aware Image Generation
This addresses the challenge of enabling image generation models to effectively learn from visual examples in context, which is incremental as it builds on recent in-context learning work but introduces a novel separation mechanism.
The paper tackles the problem of in-context learning for image generation, where models struggle to learn from visual context without text prompts, by proposing Context Diffusion, a framework that separates visual context encoding and layout preservation, resulting in improved image quality and context fidelity in both in-domain and out-of-domain tasks.
We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and context fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models cannot truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and the preservation of the desired image layout. This results in the ability to learn from the visual context and prompts, but also from either of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and human evaluation demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and context fidelity compared to counterpart models.