CV CL LGOct 5, 2021

FooDI-ML: a large multi-language dataset of food, drinks and groceries images and descriptions

David Amat Olóndriz, Ponç Palau Puigdevall, Adrià Salvador Palau

arXiv:2110.02035v25.611 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This dataset helps researchers in computer vision and NLP by providing diverse, multi-language data for tasks like retrieval and generation, though it is incremental as it builds on existing visio-linguistic datasets.

The authors introduced FooDI-ML, a large dataset with over 1.5M images and 9.5M text descriptions for food, drinks, and groceries from 37 countries and 33 languages, addressing underrepresentation of languages like Ukrainian and Kazakh, and provided benchmarks for text-image retrieval and conditional image generation.

In this paper we introduce the FooDI-ML dataset. This dataset contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections gathered from the Glovo application. The data made available corresponds to food, drinks and groceries products from 37 countries in Europe, the Middle East, Africa and Latin America. The dataset comprehends 33 languages, including 870K samples of languages of countries from Eastern Europe and Western Asia such as Ukrainian and Kazakh, which have been so far underrepresented in publicly available visio-linguistic datasets. The dataset also includes widely spoken languages such as Spanish and English. To assist further research, we include benchmarks over two tasks: text-image retrieval and conditional image generation.

View on arXiv PDF Code

Similar