LG IR SIOct 31, 2022

Learning to Navigate Wikipedia by Taking Random Walks

Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus

arXiv:2211.00177v16.99 citationsh-index: 78

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient information retrieval on large-scale knowledge graphs like Wikipedia, offering a complementary approach to search engines, though it is incremental as it builds on existing behavioral cloning and graph navigation methods.

The paper tackles the problem of learning to navigate Wikipedia by hyperlink selection, showing that behavioral cloning of random walks can achieve 96% and 92% success rates for distances of 5 and 20 steps, respectively, and the learned embeddings and policy are competitive in downstream tasks like fact verification and question answering.

A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this paper, we show that behavioral cloning of randomly sampled trajectories is sufficient to learn an effective link selection policy. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges. The model is able to efficiently navigate between nodes 5 and 20 steps apart 96% and 92% of the time, respectively. We then use the resulting embeddings and policy in downstream fact verification and question answering tasks where, in combination with basic TF-IDF search and ranking methods, they are competitive results to the state-of-the-art methods.

View on arXiv PDF

Similar