CLMar 5, 2015

Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation

Eneko Agirre, Ander Barrena, Aitor Soroa

arXiv:1503.01655v22.21 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving knowledge-based NLP tasks by better utilizing Wikipedia's link structure, though it is incremental as it builds on existing random walk methods.

The authors tackled the problem of leveraging Wikipedia's hyperlink graph for word relatedness and named-entity disambiguation, showing that using the full graph significantly outperforms direct links alone, with results comparable to state-of-the-art systems using multiple sources or supervised learning.

Hyperlinks and other relations in Wikipedia are a extraordinary resource which is still not fully understood. In this paper we study the different types of links in Wikipedia, and contrast the use of the full graph with respect to just direct links. We apply a well-known random walk algorithm on two tasks, word relatedness and named-entity disambiguation. We show that using the full graph is more effective than just direct links by a large margin, that non-reciprocal links harm performance, and that there is no benefit from categories and infoboxes, with coherent results on both tasks. We set new state-of-the-art figures for systems based on Wikipedia links, comparable to systems exploiting several information sources and/or supervised machine learning. Our approach is open source, with instruction to reproduce results, and amenable to be integrated with complementary text-based methods.

View on arXiv PDF

Similar