CLAug 21, 2020

Keywords lie far from the mean of all words in local vector space

Eirini Papagiannopoulou, Grigorios Tsoumakas, Apostolos N. Papadopoulos

arXiv:2008.09513v10.2Has Code

Originality Incremental advance

AI Analysis

This work addresses keyword extraction for document processing, presenting a novel method that is competitive but incremental in nature.

The paper tackled keyword extraction by modeling word distributions with local vector representations and ranking candidates based on position and distance from the distribution center, achieving high performance compared to strong baselines and state-of-the-art methods in an extended experimental study.

Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics. The most popular state-of-the-art unsupervised approaches belong to the family of the graph-based methods that build a graph-of-words and use various centrality measures to score the nodes (candidate keywords). In this work, we follow a different path to detect the keywords from a text document by modeling the main distribution of the document's words using local word vector representations. Then, we rank the candidates based on their position in the text and the distance between the corresponding local vectors and the main distribution's center. We confirm the high performance of our approach compared to strong baselines and state-of-the-art unsupervised keyword extraction methods, through an extended experimental study, investigating the properties of the local representations.

View on arXiv PDF Code

Similar