CLSep 16, 2019

Communication-based Evaluation for Natural Language Generation

Benjamin Newman, Reuben Cohn-Gordon, Christopher Potts

arXiv:1909.07290v230.1999 citationsh-index: 48Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of evaluating NLG systems more effectively for researchers and practitioners, though it is incremental as it builds on existing pragmatic models.

The paper tackles the misalignment of n-gram overlap measures like BLEU and ROUGE with true goals in natural language generation by proposing communication-based evaluations using the Rational Speech Acts model, showing that this method better aligns with pre-defined quality categories on a color reference dataset.

Natural language generation (NLG) systems are commonly evaluated using n-gram overlap measures (e.g. BLEU, ROUGE). These measures do not directly capture semantics or speaker intentions, and so they often turn out to be misaligned with our true goals for NLG. In this work, we argue instead for communication-based evaluations: assuming the purpose of an NLG system is to convey information to a reader/listener, we can directly evaluate its effectiveness at this task using the Rational Speech Acts model of pragmatic language use. We illustrate with a color reference dataset that contains descriptions in pre-defined quality categories, showing that our method better aligns with these quality categories than do any of the prominent n-gram overlap methods.

View on arXiv PDF Code

Similar