CLNov 10, 2021

A Novel Corpus of Discourse Structure in Humans and Computers

arXiv:2111.05940v1
Originality Synthesis-oriented
AI Analysis

This provides a resource for analyzing text generation quality, but it is incremental as it builds on existing models and annotation frameworks.

The researchers tackled the problem of comparing human and computer-generated discourse by creating a corpus of 445 documents with 27,000 clauses annotated for semantic types and coherence relations, finding that less numerous, shorter, and more incoherent clause relations correlate with lower perceived quality in computer-generated narratives and arguments.

We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and GPT-3(Brown et al., 2020). We showcase the usefulness of this corpus for detailed discourse analysis of text generation by providing preliminary evidence that less numerous, shorter and more often incoherent clause relations are associated with lower perceived quality of computer-generated narratives and arguments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes