CLMay 29, 2023

Data Augmentation for Low-Resource Keyphrase Generation

arXiv:2305.17968v1222 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of generating keyphrases for domains with limited annotated data, which is incremental as it builds on existing low-resource methods.

The paper tackles keyphrase generation in low-resource settings by developing data augmentation strategies that use article text to improve both present and absent keyphrase generation, showing consistent state-of-the-art performance improvements across three datasets.

Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases). Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations. In this paper, we present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains. We design techniques that use the full text of the articles to improve both present and absent keyphrase generation. We test our approach comprehensively on three datasets and show that the data augmentation strategies consistently improve the state-of-the-art performance. We release our source code at https://github.com/kgarg8/kpgen-lowres-data-aug.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes