CVAug 22, 2024

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

arXiv:2408.12352v216 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem for fashion and design applications, offering incremental improvements in garment generation.

The paper tackles the problem of fine-grained semantic misalignment in text-to-garment generation, where state-of-the-art models fail to accurately generate garment components in terms of quantity, position, and interrelations, and proposes GarmentAligner, which achieves superior fidelity and alignment compared to existing competitors.

General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffusion model trained with retrieval-augmented multi-level corrections. To achieve semantic alignment at the component level, we introduce an automatic component extraction pipeline to obtain spatial and quantitative information of garment components from corresponding images and captions. Subsequently, to exploit component relationships within the garment images, we construct retrieval subsets for each garment by retrieval augmentation based on component-level similarity ranking and conduct contrastive learning to enhance the model perception of components from positive and negative samples. To further enhance the alignment of components across semantic, spatial, and quantitative granularities, we propose the utilization of multi-level correction losses that leverage detailed component information. The experimental findings demonstrate that GarmentAligner achieves superior fidelity and fine-grained semantic alignment when compared to existing competitors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes