CVJan 1, 2022

Turath-150K: Image Database of Arab Heritage

arXiv:2201.00220v11.4

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of cultural bias in image databases for machine learning researchers, particularly those in under-represented regions, by providing a new dataset and benchmarks, though it is incremental as it focuses on one specific culture.

The authors tackled the lack of culturally-diverse image databases by curating Turath-150K, a dataset of Arab heritage images, and demonstrated that existing networks pre-trained on ImageNet perform poorly on this data, leading to the training and evaluation of new networks for image classification.

Large-scale image databases remain largely biased towards objects and activities encountered in a select few cultures. This absence of culturally-diverse images, which we refer to as the hidden tail, limits the applicability of pre-trained neural networks and inadvertently excludes researchers from under-represented regions. To begin remedying this issue, we curate Turath-150K, a database of images of the Arab world that reflect objects, activities, and scenarios commonly found there. In the process, we introduce three benchmark databases, Turath Standard, Art, and UNESCO, specialised subsets of the Turath dataset. After demonstrating the limitations of existing networks pre-trained on ImageNet when deployed on such benchmarks, we train and evaluate several networks on the task of image classification. As a consequence of Turath, we hope to engage machine learning researchers in under-represented regions, and to inspire the release of additional culture-focused databases. The database can be accessed here: danikiyasseh.github.io/Turath.

View on arXiv PDF

Similar