CVDec 5, 2024

AIpparel: A Multimodal Foundation Model for Digital Garments

ETH Zurich
arXiv:2412.03937v53 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient garment design for fashion designers and creators, though it is incremental as it builds on existing large multimodal models.

The paper tackles the time-consuming manual process of garment design by introducing AIpparel, a multimodal foundation model that fine-tunes large multimodal models on a dataset of over 120,000 garments to generate and edit sewing patterns, achieving state-of-the-art performance in text-to-garment and image-to-garment tasks.

Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset of over 120,000 unique garments, each with multimodal annotations including text, images, and sewing patterns. Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently. AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and image-to-garment prediction, and enables novel multimodal garment generation applications such as interactive garment editing. The project website is at https://georgenakayama.github.io/AIpparel/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes