CLLGJun 5, 2020

Human or Machine: Automating Human Likeliness Evaluation of NLG Texts

arXiv:2006.03189v17 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of costly and slow human evaluation for NLG tasks, though it appears incremental as it builds on existing language models for automation.

The paper tackles automating the evaluation of how human-like natural language generation (NLG) texts are by proposing a human likeliness score based on large pretrained language models, aiming to replace human labeling with a discrimination procedure.

Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results. In this paper, we present an attempt to automate the human likeliness evaluation of the output text samples coming from natural language generation methods used to solve several tasks. We propose to use a human likeliness score that shows the percentage of the output samples from a method that look as if they were written by a human. Instead of having human participants label or rate those samples, we completely automate the process by using a discrimination procedure based on large pretrained language models and their probability distributions. As follow up, we plan to perform an empirical analysis of human-written and machine-generated texts to find the optimal setup of this evaluation approach. A validation procedure involving human participants will also check how the automatic evaluation correlates with human judgments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes