CLOct 6, 2021

Cut the CARP: Fishing for zero-shot story evaluation

arXiv:2110.03111v316 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of expensive or inadequate evaluation for narrative text in NLP, though it is incremental as it builds on contrastive learning advances.

The paper tackles the problem of evaluating machine-generated stories by introducing CARP, a zero-shot method that shows strong correlation with human evaluations, outperforming fine-tuned or prompt-engineered language models.

Recent advances in large-scale language models (Raffel et al., 2019; Brown et al., 2020) have brought significant qualitative and quantitative improvements in machine-driven text generation. Despite this, generation and evaluation of machine-generated narrative text remains a challenging problem. Objective evaluation of computationally-generated stories may be prohibitively expensive, require meticulously annotated datasets, or may not adequately measure the logical coherence of a generated story's narratological structure. Informed by recent advances in contrastive learning (Radford et al., 2021), we present Contrastive Authoring and Reviewing Pairing (CARP): a scalable, efficient method for performing qualitatively superior, zero-shot evaluation of stories. We show a strong correlation between human evaluation of stories and those of CARP. Model outputs more significantly correlate with corresponding human input than those language-model based methods which utilize finetuning or prompt engineering approaches. We also present and analyze the Story-Critique Dataset, a new corpora composed of 1.3 million aligned story-critique pairs derived from over 80,000 stories. We expect this corpus to be of interest to NLP researchers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes