CV AI LGDec 6, 2021

A Tale of Color Variants: Representation and Self-Supervised Learning in Fashion E-Commerce

Ujjal Kr Dutta, Sandeep Repakula, Maulik Parmar, Abhinav Ravi

arXiv:2112.02910v12.61 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses a crucial issue for fashion e-commerce platforms to improve customer experience and revenue, though it is incremental in applying existing self-supervised techniques to a specific domain.

The paper tackles the problem of identifying color variants of fashion products in e-commerce, proposing a supervised framework and showing that self-supervised learning with color jitter augmentation achieves comparable performance, with quantitative results demonstrating effectiveness.

In this paper, we address a crucial problem in fashion e-commerce (with respect to customer experience, as well as revenue): color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for our fashion e-commerce platform. Our framework could be trained with supervisory signals in the form of triplets, that are obtained manually. However, it is infeasible to obtain manual annotations for the entire huge collection of data usually present in fashion e-commerce platforms, such as ours, while capturing all the difficult corner cases. But, to our rescue, interestingly we observed that this crucial problem in fashion e-commerce could also be solved by simple color jitter based image augmentation, that recently became widely popular in the contrastive Self-Supervised Learning (SSL) literature, that seeks to learn visual representations without using manual labels. This naturally led to a question in our mind: Could we leverage SSL in our use-case, and still obtain comparable performance to our supervised framework? The answer is, Yes! because, color variant fashion objects are nothing but manifestations of a style, in different colors, and a model trained to be invariant to the color (with, or without supervision), should be able to recognize this! This is what the paper further demonstrates, both qualitatively, and quantitatively, while evaluating a couple of state-of-the-art SSL techniques, and also proposing a novel method.

View on arXiv PDF

Similar