CLAIMay 24, 2023

Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting

arXiv:2305.14755v2136 citations
Originality Incremental advance
AI Analysis

This addresses the issue of generic and incoherent rewrites in stylistic text rewriting for NLP applications, though it is incremental as it builds on existing methods by adding context.

The paper tackles the problem of stylistic text rewriting by integrating textual context into both rewriting and evaluation stages, showing that humans significantly prefer contextual rewrites and that a new contextual metric (CtxSimFit) correlates well with human preferences (ρ=0.7–0.9) compared to poor correlations from existing sentence-level metrics (ρ=0–0.3).

Most existing stylistic text rewriting methods and evaluation metrics operate on a sentence level, but ignoring the broader context of the text can lead to preferring generic, ambiguous, and incoherent rewrites. In this paper, we investigate integrating the preceding textual context into both the $\textit{rewriting}$ and $\textit{evaluation}$ stages of stylistic text rewriting, and introduce a new composite contextual evaluation metric $\texttt{CtxSimFit}$ that combines similarity to the original sentence with contextual cohesiveness. We comparatively evaluate non-contextual and contextual rewrites in formality, toxicity, and sentiment transfer tasks. Our experiments show that humans significantly prefer contextual rewrites as more fitting and natural over non-contextual ones, yet existing sentence-level automatic metrics (e.g., ROUGE, SBERT) correlate poorly with human preferences ($ρ$=0--0.3). In contrast, human preferences are much better reflected by both our novel $\texttt{CtxSimFit}$ ($ρ$=0.7--0.9) as well as proposed context-infused versions of common metrics ($ρ$=0.4--0.7). Overall, our findings highlight the importance of integrating context into the generation and especially the evaluation stages of stylistic text rewriting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes