CVNov 27, 2024

ROICtrl: Boosting Instance Control for Visual Generation

arXiv:2411.17949v18 citationsh-index: 11CVPR
Originality Incremental advance
AI Analysis

This work addresses a limitation in visual generation models for applications requiring complex multi-instance compositions, representing an incremental improvement with novel method integration.

The paper tackles the problem of inaccurate positional and attribute association for multiple instances in text-based visual generation by introducing ROICtrl, an adapter for diffusion models that enables precise regional instance control using bounding boxes and captions, achieving superior performance and significantly reducing computational costs.

Natural language often struggles to accurately associate positional and attribute information with multiple instances, which limits current text-based visual generation models to simpler compositions featuring only a few dominant instances. To address this limitation, this work enhances diffusion models by introducing regional instance control, where each instance is governed by a bounding box paired with a free-form caption. Previous methods in this area typically rely on implicit position encoding or explicit attention masks to separate regions of interest (ROIs), resulting in either inaccurate coordinate injection or large computational overhead. Inspired by ROI-Align in object detection, we introduce a complementary operation called ROI-Unpool. Together, ROI-Align and ROI-Unpool enable explicit, efficient, and accurate ROI manipulation on high-resolution feature maps for visual generation. Building on ROI-Unpool, we propose ROICtrl, an adapter for pretrained diffusion models that enables precise regional instance control. ROICtrl is compatible with community-finetuned diffusion models, as well as with existing spatial-based add-ons (\eg, ControlNet, T2I-Adapter) and embedding-based add-ons (\eg, IP-Adapter, ED-LoRA), extending their applications to multi-instance generation. Experiments show that ROICtrl achieves superior performance in regional instance control while significantly reducing computational costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes