CLMay 29, 2018

Human vs Automatic Metrics: on the Importance of Correlation Design

arXiv:1805.11474v317 citations
Originality Synthesis-oriented
AI Analysis

This addresses a methodological issue for researchers in natural language generation evaluation, but it is incremental as it builds on existing correlation approaches.

The paper investigates the inconsistency in correlation results between automatic evaluation metrics and human judgments in natural language generation, depending on whether system-level or sentence-level analysis is used.

This paper discusses two existing approaches to the correlation analysis between automatic evaluation metrics and human scores in the area of natural language generation. Our experiments show that depending on the usage of a system- or sentence-level correlation analysis, correlation results between automatic scores and human judgments are inconsistent.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes