CVJul 8, 2025

TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model

arXiv:2507.05790v1h-index: 12ICME
Originality Incremental advance
AI Analysis

This addresses the need for more versatile and flexible virtual try-on systems for users, though it appears incremental by building on existing multimodal and large language model capabilities.

The paper tackles the problem of achieving multifunctional virtual try-on guided solely by text instructions, such as full outfit changes and local editing, and results in better semantic consistency and visual quality compared to current methods.

Virtual try-on has made significant progress in recent years. This paper addresses how to achieve multifunctional virtual try-on guided solely by text instructions, including full outfit change and local editing. Previous methods primarily relied on end-to-end networks to perform single try-on tasks, lacking versatility and flexibility. We propose TalkFashion, an intelligent try-on assistant that leverages the powerful comprehension capabilities of large language models to analyze user instructions and determine which task to execute, thereby activating different processing pipelines accordingly. Additionally, we introduce an instruction-based local repainting model that eliminates the need for users to manually provide masks. With the help of multi-modal models, this approach achieves fully automated local editings, enhancing the flexibility of editing tasks. The experimental results demonstrate better semantic consistency and visual quality compared to the current methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes