CLAug 25, 2020

Concept Extraction Using Pointer-Generator Networks

arXiv:2008.11295v10.2Has Code

Originality Incremental advance

AI Analysis

This addresses concept extraction for downstream applications, offering a generic open-domain solution that is incremental over existing methods.

The paper tackles concept extraction by proposing a pointer-generator network model trained on a large Wikipedia corpus, which significantly outperforms standard techniques like DBpedia Spotlight and further improves performance when used on top of it, achieving state-of-the-art results on other datasets.

Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk-concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open-domain OOV-oriented extractive model that is based on distant supervision of a pointer-generator network leveraging bidirectional LSTMs and a copy mechanism. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.

View on arXiv PDF Code

Similar