CLOct 6, 2021

Cut the CARP: Fishing for zero-shot story evaluation

Shahbuland Matiana, JR Smith, Ryan Teehan, Louis Castricato, Stella Biderman, Leo Gao, Spencer Frazier

arXiv:2110.03111v32.016 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of expensive or inadequate evaluation for narrative text in NLP, though it is incremental as it builds on contrastive learning advances.

The paper tackles the problem of evaluating machine-generated stories by introducing CARP, a zero-shot method that shows strong correlation with human evaluations, outperforming fine-tuned or prompt-engineered language models.

Recent advances in large-scale language models (Raffel et al., 2019; Brown et al., 2020) have brought significant qualitative and quantitative improvements in machine-driven text generation. Despite this, generation and evaluation of machine-generated narrative text remains a challenging problem. Objective evaluation of computationally-generated stories may be prohibitively expensive, require meticulously annotated datasets, or may not adequately measure the logical coherence of a generated story's narratological structure. Informed by recent advances in contrastive learning (Radford et al., 2021), we present Contrastive Authoring and Reviewing Pairing (CARP): a scalable, efficient method for performing qualitatively superior, zero-shot evaluation of stories. We show a strong correlation between human evaluation of stories and those of CARP. Model outputs more significantly correlate with corresponding human input than those language-model based methods which utilize finetuning or prompt engineering approaches. We also present and analyze the Story-Critique Dataset, a new corpora composed of 1.3 million aligned story-critique pairs derived from over 80,000 stories. We expect this corpus to be of interest to NLP researchers.

View on arXiv PDF

Similar