IRCLJun 28, 2021

Keyphrase Generation for Scientific Document Retrieval

arXiv:2106.14726v11005 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing document retrieval for researchers and information systems, but it is incremental as it builds on existing keyphrase generation methods.

The study tackled the problem of whether sequence-to-sequence keyphrase generation models can reliably improve scientific document retrieval, and found that they significantly enhance retrieval performance, with concrete improvements demonstrated through a new evaluation framework.

Sequence-to-sequence models have lead to significant progress in keyphrase generation, but it remains unknown whether they are reliable enough to be beneficial for document retrieval. This study provides empirical evidence that such models can significantly improve retrieval performance, and introduces a new extrinsic evaluation framework that allows for a better understanding of the limitations of keyphrase generation models. Using this framework, we point out and discuss the difficulties encountered with supplementing documents with -- not present in text -- keyphrases, and generalizing models across domains. Our code is available at https://github.com/boudinfl/ir-using-kg

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes