CLLGFeb 6, 2023

Coherence and Diversity through Noise: Self-Supervised Paraphrase Generation via Structure-Aware Denoising

arXiv:2302.02780v11 citationsh-index: 28
Originality Incremental advance
AI Analysis

This addresses the need for plagiarism reduction and enhanced student understanding in online pedagogy, though it is incremental as it builds on existing denoising methods for a specific domain.

The paper tackles the problem of generating paraphrases for algebraic word problems to preserve solution-critical information while ensuring diversity, and demonstrates that SCANING improves semantic preservation and diversity across four datasets.

In this paper, we propose SCANING, an unsupervised framework for paraphrasing via controlled noise injection. We focus on the novel task of paraphrasing algebraic word problems having practical applications in online pedagogy as a means to reduce plagiarism as well as ensure understanding on the part of the student instead of rote memorization. This task is more complex than paraphrasing general-domain corpora due to the difficulty in preserving critical information for solution consistency of the paraphrased word problem, managing the increased length of the text and ensuring diversity in the generated paraphrase. Existing approaches fail to demonstrate adequate performance on at least one, if not all, of these facets, necessitating the need for a more comprehensive solution. To this end, we model the noising search space as a composition of contextual and syntactic aspects and sample noising functions consisting of either one or both aspects. This allows for learning a denoising function that operates over both aspects and produces semantically equivalent and syntactically diverse outputs through grounded noise injection. The denoising function serves as a foundation for learning a paraphrasing function which operates solely in the input-paraphrase space without carrying any direct dependency on noise. We demonstrate SCANING considerably improves performance in terms of both semantic preservation and producing diverse paraphrases through extensive automated and manual evaluation across 4 datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes