Refining Wikidata Taxonomy using Large Language Models
This work addresses the issue of taxonomy inaccuracies in Wikidata, which is a critical resource for knowledge representation, but the approach is incremental as it builds on existing LLM and graph techniques.
The authors tackled the problem of cleaning up the complex and error-prone taxonomy in Wikidata by introducing WiKC, a method that automatically refines it using Large Language Models and graph mining, resulting in a taxonomy evaluated for quality through intrinsic and extrinsic tasks like entity typing.
Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.