CLSep 24, 2018

Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!

arXiv:1809.08731v11119 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate fluency evaluation in natural language processing, offering both referenceless and reference-based metrics that improve upon existing methods, though it is incremental in nature.

The paper tackles the problem of evaluating sentence-level fluency in natural language generation by proposing SLOR and WPSLOR as referenceless metrics, which achieve significantly higher correlation with human fluency scores than word-overlap metrics like ROUGE on a benchmark dataset. It also introduces ROUGE-LM, a reference-based metric that outperforms all baselines, including WPSLOR, in correlation with human judgments.

Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes