AIAug 11, 2021

Analyzing Race and Country of Citizenship Bias in Wikidata

arXiv:2108.05412v110 citations
Originality Synthesis-oriented
AI Analysis

This work addresses bias in a widely used knowledge graph, which is important for ensuring accurate representation in STEM fields, though it is incremental as it extends prior gender-focused studies.

The study analyzed race and country of citizenship biases in Wikidata, particularly for STEM professionals, by comparing it to real-world datasets and found an overrepresentation of white individuals and those from Europe and North America, with other groups underrepresented.

As an open and collaborative knowledge graph created by users and bots, it is possible that the knowledge in Wikidata is biased in regards to multiple factors such as gender, race, and country of citizenship. Previous work has mostly studied the representativeness of Wikidata knowledge in terms of genders of people. In this paper, we examine the race and citizenship bias in general and in regards to STEM representation for scientists, software developers, and engineers. By comparing Wikidata queries to real-world datasets, we identify the differences in representation to characterize the biases present in Wikidata. Through this analysis, we discovered that there is an overrepresentation of white individuals and those with citizenship in Europe and North America; the rest of the groups are generally underrepresented. Based on these findings, we have found and linked to Wikidata additional data about STEM scientists from the minorities. This data is ready to be inserted into Wikidata with a bot. Increasing representation of minority race and country of citizenship groups can create a more accurate portrayal of individuals in STEM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes