LGCLMLMay 29, 2019

Unsupervised Paraphrasing without Translation

arXiv:1905.12752v11116 citations
Originality Incremental advance
AI Analysis

This work addresses the need for paraphrasing tools that do not require bilingual data, which is incremental as it builds on existing methods but shifts away from translation-based approaches.

The authors tackled the problem of automatic paraphrasing without relying on machine translation, proposing a monolingual method using a residual vector-quantized variational auto-encoder. The result showed that monolingual paraphrasing outperformed unsupervised translation in all settings, with mixed performance compared to supervised translation, being better for identification and augmentation but worse for generation.

Paraphrasing exemplifies the ability to abstract semantic content from surface forms. Recent work on automatic paraphrasing is dominated by methods leveraging Machine Translation (MT) as an intermediate step. This contrasts with humans, who can paraphrase without being bilingual. This work proposes to learn paraphrasing models from an unlabeled monolingual corpus only. To that end, we propose a residual variant of vector-quantized variational auto-encoder. We compare with MT-based approaches on paraphrase identification, generation, and training augmentation. Monolingual paraphrasing outperforms unsupervised translation in all settings. Comparisons with supervised translation are more mixed: monolingual paraphrasing is interesting for identification and augmentation; supervised translation is superior for generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes