CVCLLGNov 17, 2022

GLAMI-1M: A Multilingual Image-Text Fashion Dataset

arXiv:2211.14451v16 citationsh-index: 16Has Code
Originality Synthesis-oriented
AI Analysis

This provides a new benchmark for fine-grained fashion classification across multiple languages, though it is incremental as it builds on existing dataset and model paradigms.

The authors tackled the problem of multilingual image-text classification in fashion by introducing GLAMI-1M, the largest dataset of its kind with 1M images and descriptions in 13 languages, achieving a best accuracy of 69.7% with an EmbraceNet model.

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at https://github.com/glami/glami-1m

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes