Approaches for Enriching and Improving Textual Knowledge Bases
This work addresses verifiability and enrichment issues in Wikipedia, which is crucial for users relying on accurate and up-to-date information, though it appears incremental as it builds on existing citation practices.
The paper tackles the problem of incomplete and poorly cited Wikipedia entries by proposing automated methods to enforce verifiability and suggest missing news references, aiming to improve the reliability and completeness of textual knowledge bases.
Verifiability is one of the core editing principles in Wikipedia, where editors are encouraged to provide citations for the added statements. Statements can be any arbitrary piece of text, ranging from a sentence up to a paragraph. However, in many cases, citations are either outdated, missing, or link to non-existing references (e.g. dead URL, moved content etc.). In total, 20\% of the cases such citations refer to news articles and represent the second most cited source. Even in cases where citations are provided, there are no explicit indicators for the span of a citation for a given piece of text. In addition to issues related with the verifiability principle, many Wikipedia entity pages are incomplete, with relevant information that is already available in online news sources missing. Even for the already existing citations, there is often a delay between the news publication time and the reference time. In this thesis, we address the aforementioned issues and propose automated approaches that enforce the verifiability principle in Wikipedia, and suggest relevant and missing news references for further enriching Wikipedia entity pages.