WikiRank: Improving Keyphrase Extraction Based on Background Knowledge
This addresses the problem of extracting keyphrases from documents for researchers and practitioners in NLP, though it is incremental as it builds on existing graph-based methods.
The paper tackles keyphrase extraction by incorporating Wikipedia background knowledge into an unsupervised method called WikiRank, achieving over 2% improvement in F1-score over state-of-the-art models.
Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other state-of-art models by more than 2% in F1-score.