Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images
This addresses the challenge of capturing subtle style differences in fashion for applications like image retrieval and organization, though it is incremental as it builds on existing topic modeling techniques.
The paper tackles the problem of defining and modeling visual style in fashion images by proposing an unsupervised approach that learns a style-coherent representation using probabilistic polylingual topic models based on visual attributes, enabling tasks like retrieving, mixing, and summarizing outfits without style labels on over 100K images.
What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn a style-coherent representation. Our method leverages probabilistic polylingual topic models based on visual attributes to discover a set of latent style factors. Given a collection of unlabeled fashion images, our approach mines for the latent styles, then summarizes outfits by how they mix those styles. Our approach can organize galleries of outfits by style without requiring any style labels. Experiments on over 100K images demonstrate its promise for retrieving, mixing, and summarizing fashion images by their style.