CLMar 26, 2024

Task-Oriented Paraphrase Analytics

arXiv:2403.17564v181 citationsh-index: 32LREC
Originality Synthesis-oriented
AI Analysis

This work addresses the issue of incomparable results in paraphrasing studies for researchers, but it is incremental as it builds on existing literature without introducing new methods.

The paper tackles the problem of inconsistent definitions in paraphrasing research by proposing a taxonomy to organize 25 paraphrasing subtasks and using classifiers to show that existing paraphrase corpora have varying task-specific distributions, leading to incomparable results.

Since paraphrasing is an ill-defined task, the term "paraphrasing" covers text transformation tasks with different characteristics. Consequently, existing paraphrasing studies have applied quite different (explicit and implicit) criteria as to when a pair of texts is to be considered a paraphrase, all of which amount to postulating a certain level of semantic or lexical similarity. In this paper, we conduct a literature review and propose a taxonomy to organize the 25~identified paraphrasing (sub-)tasks. Using classifiers trained to identify the tasks that a given paraphrasing instance fits, we find that the distributions of task-specific instances in the known paraphrase corpora vary substantially. This means that the use of these corpora, without the respective paraphrase conditions being clearly defined (which is the normal case), must lead to incomparable and misleading results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes