CLMar 5, 2015

Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation

arXiv:1503.01655v21 citationsHas Code
AI Analysis

This work addresses the challenge of improving knowledge-based NLP tasks by better utilizing Wikipedia's link structure, though it is incremental as it builds on existing random walk methods.

The authors tackled the problem of leveraging Wikipedia's hyperlink graph for word relatedness and named-entity disambiguation, showing that using the full graph significantly outperforms direct links alone, with results comparable to state-of-the-art systems using multiple sources or supervised learning.

Hyperlinks and other relations in Wikipedia are a extraordinary resource which is still not fully understood. In this paper we study the different types of links in Wikipedia, and contrast the use of the full graph with respect to just direct links. We apply a well-known random walk algorithm on two tasks, word relatedness and named-entity disambiguation. We show that using the full graph is more effective than just direct links by a large margin, that non-reciprocal links harm performance, and that there is no benefit from categories and infoboxes, with coherent results on both tasks. We set new state-of-the-art figures for systems based on Wikipedia links, comparable to systems exploiting several information sources and/or supervised machine learning. Our approach is open source, with instruction to reproduce results, and amenable to be integrated with complementary text-based methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes