CVApr 9, 2024

SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

arXiv:2404.06451v126 citationsh-index: 16Has CodeECCV
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in text-to-image generation for users needing precise control over layouts, but it is incremental as it builds on existing ControlNet methods.

The paper tackles the problem of degraded image generation when text prompts conflict with rough visual conditions in layout-controllable text-to-image models, presenting SmartControl which relaxes visual conditions in conflicted areas to adapt to prompts, achieving efficacy in experiments across four condition types.

Human visual imagination usually begins with analogies or rough sketches. For example, given an image with a girl playing guitar before a building, one may analogously imagine how it seems like if Iron Man playing guitar before Pyramid in Egypt. Nonetheless, visual condition may not be precisely aligned with the imaginary result indicated by text prompt, and existing layout-controllable text-to-image (T2I) generation models is prone to producing degraded generated results with obvious artifacts. To address this issue, we present a novel T2I generation method dubbed SmartControl, which is designed to modify the rough visual conditions for adapting to text prompt. The key idea of our SmartControl is to relax the visual condition on the areas that are conflicted with text prompts. In specific, a Control Scale Predictor (CSP) is designed to identify the conflict regions and predict the local control scales, while a dataset with text prompts and rough visual conditions is constructed for training CSP. It is worth noting that, even with a limited number (e.g., 1,000~2,000) of training samples, our SmartControl can generalize well to unseen objects. Extensive experiments on four typical visual condition types clearly show the efficacy of our SmartControl against state-of-the-arts. Source code, pre-trained models, and datasets are available at https://github.com/liuxiaoyu1104/SmartControl.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes