CVDec 12, 2023

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha

arXiv:2312.06971v110.411 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses the need for conditional control in efficient generative models, but it is incremental as it adapts existing ControlNet techniques to CMs.

The paper tackles the problem of adding conditional controls to pretrained Consistency Models (CMs) for text-to-image generation, finding that ControlNet from diffusion models can be applied for high-level semantics but struggles with details, and proposing methods like training from scratch or using adapters to enable efficient control transfer.

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider alternative strategies for adding ControlNet-like conditional control to CMs and present three significant findings. 1) ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but struggles with low-level detail and realism control. 2) CMs serve as an independent class of generative models, based on which ControlNet can be trained from scratch using Consistency Training proposed by Song et al. 3) A lightweight adapter can be jointly optimized under multiple conditions through Consistency Training, allowing for the swift transfer of DMs-based ControlNet to CMs. We study these three solutions across various conditional controls, including edge, depth, human pose, low-resolution image and masked image with text-to-image latent consistency models.

View on arXiv PDF

Similar