CLAILGSep 20, 2019

Towards Neural Language Evaluators

arXiv:1909.09268v22 citations
AI Analysis

This work tackles the problem of improving evaluation metrics for text summarization, which is incremental as it builds on existing methods.

The paper addresses limitations of BLEU and ROUGE for evaluating summaries by proposing criteria for good metrics and using Transformer-based language models to assess reference and hypothesis summaries, but does not report specific numerical results.

We review three limitations of BLEU and ROUGE -- the most popular metrics used to assess reference summaries against hypothesis summaries, come up with criteria for what a good metric should behave like and propose concrete ways to use recent Transformers-based Language Models to assess reference summaries against hypothesis summaries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes