CLJan 5, 2021

On the interaction of automatic evaluation and task framing in headline style transfer

arXiv:2101.01634v1841 citations
Originality Incremental advance
AI Analysis

This work is significant for researchers and practitioners in Natural Language Generation (NLG) who struggle with reliable evaluation of style transfer tasks, offering an improved automatic evaluation method.

The paper addresses the challenge of evaluating style transfer systems, particularly for subtle textual differences like headline style transfer. It proposes a classifier-based evaluation method that is shown to better reflect system differences compared to traditional metrics like BLEU and ROUGE.

An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method for this task based on purposely-trained classifiers, showing that it better reflects system differences than traditional metrics such as BLEU and ROUGE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes