IRCLApr 8, 2022

Contrastive language and vision learning of general fashion concepts

Stanford
arXiv:2204.03972v473 citationsh-index: 29
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more generalizable product representations in the fashion industry, though it is incremental as it adapts an existing method to a specific domain.

The authors tackled the problem of specialized supervised learning in fashion e-commerce by developing FashionCLIP, a CLIP-like model for transferable representations, which they demonstrated for retrieval, classification, and grounding tasks and released publicly.

The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from more transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model for the fashion industry. We showcase its capabilities for retrieval, classification and grounding, and release our model and code to the community.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes