CLAILGDec 4, 2020

Delexicalized Paraphrase Generation

arXiv:2012.02763v1990 citations
AI Analysis

This work addresses the problem of generating high-quality, delexicalized paraphrases for improving natural language understanding systems, particularly for NLU developers.

This paper introduces a neural model for generating delexicalized paraphrases by training on data with reference paraphrases representing semantic equivalence based on annotated slots and intents. The model improves exact match on live utterances by 1.29% and benefits NLU tasks like intent classification and named entity recognition through data augmentation.

We present a neural model for paraphrasing and train it to generate delexicalized sentences. We achieve this by creating training data in which each input is paired with a number of reference paraphrases. These sets of reference paraphrases represent a weak type of semantic equivalence based on annotated slots and intents. To understand semantics from different types of slots, other than anonymizing slots, we apply convolutional neural networks (CNN) prior to pooling on slot values and use pointers to locate slots in the output. We show empirically that the generated paraphrases are of high quality, leading to an additional 1.29% exact match on live utterances. We also show that natural language understanding (NLU) tasks, such as intent classification and named entity recognition, can benefit from data augmentation using automatically generated paraphrases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes