Control Color: Multimodal Diffusion-based Interactive Image Colorization
It addresses the problem of controllable and high-quality image colorization for users in creative and editing applications, though it is incremental as it builds on pre-trained Stable Diffusion models.
The paper tackles limitations in existing image colorization methods, such as lack of user interaction and color overflow, by introducing Control Color (CtrlColor), a multimodal diffusion-based approach that supports text, strokes, and exemplars, achieving state-of-the-art performance in qualitative and quantitative comparisons.
Despite the existence of numerous colorization methods, several limitations still exist, such as lack of user interaction, inflexibility in local colorization, unnatural color rendering, insufficient color variation, and color overflow. To solve these issues, we introduce Control Color (CtrlColor), a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model, offering promising capabilities in highly controllable interactive image colorization. While several diffusion-based methods have been proposed, supporting colorization in multiple modalities remains non-trivial. In this study, we aim to tackle both unconditional and conditional image colorization (text prompts, strokes, exemplars) and address color overflow and incorrect color within a unified framework. Specifically, we present an effective way to encode user strokes to enable precise local color manipulation and employ a practical way to constrain the color distribution similar to exemplars. Apart from accepting text prompts as conditions, these designs add versatility to our approach. We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring. Extensive comparisons show that our model outperforms state-of-the-art image colorization methods both qualitatively and quantitatively.