CLOct 3, 2021

Multi-Document Keyphrase Extraction: Dataset, Baselines and Review

arXiv:2110.01073v2
Originality Synthesis-oriented
AI Analysis

This addresses a gap for researchers in NLP by providing a benchmark for a previously understudied task, though it is incremental as it focuses on dataset creation rather than novel methods.

The authors tackled the lack of a dataset for multi-document keyphrase extraction by creating MK-DUC-01, the first such dataset, and tested baseline methods on it.

Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To stimulate this pursuit, we present here the first dataset for the task, MK-DUC-01, which can serve as a new benchmark, and test multiple keyphrase extraction baselines on our data. In addition, we provide a brief, yet comprehensive, literature review of the task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes