Towards A Fairer Landmark Recognition Dataset
This addresses the issue of biased data in computer vision for landmark recognition, which can affect model fairness and accuracy, though it is incremental as it builds on existing dataset creation methods.
The authors tackled the problem of bias in landmark recognition datasets by creating a new dataset with fair worldwide representation, using anonymized Google Maps user contributions and demographic information to estimate landmark relevance, resulting in a dataset that provides much fairer coverage of the world compared to existing ones.
We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead to biased data. To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population. These relevances are estimated by combining anonymized Google Maps user contribution statistics with the contributors' demographic information. We present a stratification approach and analysis which leads to a much fairer coverage of the world, compared to existing datasets. The resulting datasets are used to evaluate computer vision models as part of the the Google Landmark Recognition and RetrievalChallenges 2021.