CVJan 28

Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits

arXiv:2601.20511v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the demand for intuitive portrait creation on social media, offering a novel task with a dataset and method for complex edits and detail preservation, though it is incremental in building on existing text-guided generation techniques.

The paper tackles the problem of generating diverse, high-quality portrait collections by editing a reference image with natural language instructions, introducing the Portrait Collection Generation (PCG) task and proposing the CHEESE dataset and SCheese framework, which achieve state-of-the-art performance.

As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework that combines text-guided generation with hierarchical identity and detail preservation. SCheese employs adaptive feature fusion mechanism to maintain identity consistency, and ConsistencyNet to inject fine-grained features for detail consistency. Comprehensive experiments validate the effectiveness of CHEESE in advancing PCG, with SCheese achieving state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes