Diagnosing and Mitigating Semantic Inconsistencies in Wikidata's Classification Hierarchy
This work addresses taxonomic errors in Wikidata, a central open knowledge graph, which is an incremental improvement for researchers and users relying on accurate knowledge representation.
This study tackled taxonomic inconsistencies in Wikidata's classification hierarchy by proposing a validation method to identify errors like over-generalized subclass links and redundant connections, and introduced a system for user inspection to leverage crowdsourcing for corrections.
Wikidata is currently the largest open knowledge graph on the web, encompassing over 120 million entities. It integrates data from various domain-specific databases and imports a substantial amount of content from Wikipedia, while also allowing users to freely edit its content. This openness has positioned Wikidata as a central resource in knowledge graph research and has enabled convenient knowledge access for users worldwide. However, its relatively loose editorial policy has also led to a degree of taxonomic inconsistency. Building on prior work, this study proposes and applies a novel validation method to confirm the presence of classification errors, over-generalized subclass links, and redundant connections in specific domains of Wikidata. We further introduce a new evaluation criterion for determining whether such issues warrant correction and develop a system that allows users to inspect the taxonomic relationships of arbitrary Wikidata entities-leveraging the platform's crowdsourced nature to its full potential.