CL IRJan 25, 2014

Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks

Shibamouli Lahiri, Sagnik Ray Choudhury, Cornelia Caragea

arXiv:1401.6571v174 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient keyword extraction for NLP applications like summarization and search, offering incremental improvements by exploring alternative centrality measures.

The paper tackled keyword and keyphrase extraction by testing various centrality measures on collocation networks, finding that simpler measures like degree and strength perform as well as or better than PageRank on four benchmark datasets, with results competitive or superior to strong unsupervised baselines.

Keyword and keyphrase extraction is an important problem in natural language processing, with applications ranging from summarization to semantic search to document clustering. Graph-based approaches to keyword and keyphrase extraction avoid the problem of acquiring a large in-domain training corpus by applying variants of PageRank algorithm on a network of words. Although graph-based approaches are knowledge-lean and easily adoptable in online systems, it remains largely open whether they can benefit from centrality measures other than PageRank. In this paper, we experiment with an array of centrality measures on word and noun phrase collocation networks, and analyze their performance on four benchmark datasets. Not only are there centrality measures that perform as well as or better than PageRank, but they are much simpler (e.g., degree, strength, and neighborhood size). Furthermore, centrality-based methods give results that are competitive with and, in some cases, better than two strong unsupervised baselines.

View on arXiv PDF

Similar