CVMar 27, 2020

CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data

arXiv:2003.12299v243 citations
AI Analysis

This addresses image-text composition for fashion applications, representing an incremental improvement over existing methods.

The authors tackled the problem of measuring semantic distance between composed image-text embeddings in the fashion domain, developing CurlingNet with Delivery and Sweeping components that outperformed previous state-of-the-art models like TIRG and FiLM, achieving one of the best performances in the ICCV 2019 fashion-IQ challenge.

We present an approach named CurlingNet that can measure the semantic distance of composition of image-text embedding. In order to learn an effective image-text composition for the data in the fashion domain, our model proposes two key components as follows. First, the Delivery makes the transition of a source image in an embedding space. Second, the Sweeping emphasizes query-related components of fashion images in the embedding space. We utilize a channel-wise gating mechanism to make it possible. Our single model outperforms previous state-of-the-art image-text composition models including TIRG and FiLM. We participate in the first fashion-IQ challenge in ICCV 2019, for which ensemble of our model achieves one of the best performances.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes