CLJan 11, 2019

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

arXiv:1901.03644v186 citations
Originality Incremental advance
AI Analysis

This provides a resource for natural language processing tasks like sentence rewriting, but it is incremental as it builds on existing methods like ParaNMT with added constraints.

The authors tackled the problem of generating high-quality sentential paraphrases by creating ParaBank, a large-scale English paraphrase dataset using lexically-constrained neural machine translation, resulting in over 4 billion tokens with improved semantic similarity and fluency compared to prior work.

We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank's paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes