CLNov 10, 2021

A Novel Corpus of Discourse Structure in Humans and Computers

Babak Hemmatian, Sheridan Feucht, Rachel Avram, Alexander Wey, Muskaan Garg, Kate Spitalnic, Carsten Eickhoff, Ellie Pavlick, Bjorn Sandstede, Steven Sloman

arXiv:2111.05940v10.2Has Code

Originality Synthesis-oriented

AI Analysis

This provides a resource for analyzing text generation quality, but it is incremental as it builds on existing models and annotation frameworks.

The researchers tackled the problem of comparing human and computer-generated discourse by creating a corpus of 445 documents with 27,000 clauses annotated for semantic types and coherence relations, finding that less numerous, shorter, and more incoherent clause relations correlate with lower perceived quality in computer-generated narratives and arguments.

We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and GPT-3(Brown et al., 2020). We showcase the usefulness of this corpus for detailed discourse analysis of text generation by providing preliminary evidence that less numerous, shorter and more often incoherent clause relations are associated with lower perceived quality of computer-generated narratives and arguments.

View on arXiv PDF Code

Similar