Enhancing Unsupervised Keyword Extraction in Academic Papers through Integrating Highlights with Abstract
For researchers in NLP and information retrieval, this work provides a simple yet effective enhancement to unsupervised keyword extraction by leveraging a previously underutilized section of academic papers.
This paper investigates the impact of incorporating highlights with abstracts for unsupervised keyword extraction in academic papers. Experiments on CS and LIS datasets show that integrating highlights with abstracts significantly improves extraction performance across four unsupervised models.
Automatic keyword extraction from academic papers is a key area of interest in natural language processing and information retrieval. Although previous research has mainly focused on utilizing abstract and references for keyword extraction, this paper focuses on the highlights section - a summary describing the key findings and contributions, offering readers a quick overview of the research. Our observations indicate that highlights contain valuable keyword information that can effectively complement the abstract. To investigate the impact of incorporating highlights into unsupervised keyword extraction, we evaluate three input scenarios: using only the abstract, the highlights, and a combination of both. Experiments conducted with four unsupervised models on Computer Science (CS), Library and Information Science (LIS) datasets reveal that integrating the abstract with highlights significantly improves extraction performance. Furthermore, we examine the differences in keyword coverage and content between abstract and highlights, exploring how these variations influence extraction outcomes. The data and code are available at https://github.com/xiangyi-njust/Highlight-KPE.