Unsupervised Paraphrasing without Translation
This work addresses the need for paraphrasing tools that do not require bilingual data, which is incremental as it builds on existing methods but shifts away from translation-based approaches.
The authors tackled the problem of automatic paraphrasing without relying on machine translation, proposing a monolingual method using a residual vector-quantized variational auto-encoder. The result showed that monolingual paraphrasing outperformed unsupervised translation in all settings, with mixed performance compared to supervised translation, being better for identification and augmentation but worse for generation.
Paraphrasing exemplifies the ability to abstract semantic content from surface forms. Recent work on automatic paraphrasing is dominated by methods leveraging Machine Translation (MT) as an intermediate step. This contrasts with humans, who can paraphrase without being bilingual. This work proposes to learn paraphrasing models from an unlabeled monolingual corpus only. To that end, we propose a residual variant of vector-quantized variational auto-encoder. We compare with MT-based approaches on paraphrase identification, generation, and training augmentation. Monolingual paraphrasing outperforms unsupervised translation in all settings. Comparisons with supervised translation are more mixed: monolingual paraphrasing is interesting for identification and augmentation; supervised translation is superior for generation.