CL LG MLOct 16, 2022

Model Criticism for Long-Form Text Generation

Yuntian Deng, Volodymyr Kuleshov, Alexander M. Rush

AI2

arXiv:2210.08444v124.5299 citationsh-index: 60Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of assessing text quality for users relying on AI-generated content, though it is incremental as it applies existing statistical tools to a new domain.

The paper tackled the problem of evaluating high-level structure in long-form text generated by language models, finding that transformer-based models capture topical structures but struggle with maintaining coherence and modeling coreference.

Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e.g., story progression). Here, we propose to apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of the generated text. Model criticism compares the distributions between real and generated data in a latent space obtained according to an assumptive generative process. Different generative processes identify specific failure modes of the underlying model. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality -- and find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.

View on arXiv PDF Code

Similar