CLAIFeb 16, 2024

Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs

DeepMind
arXiv:2402.10586v234 citationsh-index: 22Has CodeACL
AI Analysis

This addresses the challenge of detecting AI-generated content, which is crucial for security and authenticity, but it is incremental as it builds on existing discourse analysis methods.

The paper tackled the problem of distinguishing machine-generated from human-written texts by analyzing discourse patterns, finding that hierarchical discourse features improve classifier performance on out-of-distribution and paraphrased samples.

With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel methodology, we leverage hierarchical parse trees and recursive hypergraphs to unveil distinctive discourse patterns in texts produced by both LLMs and humans. Empirical findings demonstrate that, although both LLMs and humans generate distinct discourse patterns influenced by specific domains, human-written texts exhibit more structural variability, reflecting the nuanced nature of human writing in different domains. Notably, incorporating hierarchical discourse features enhances binary classifiers' overall performance in distinguishing between human-written and machine-generated texts, even on out-of-distribution and paraphrased samples. This underscores the significance of incorporating hierarchical discourse features in the analysis of text patterns. The code and dataset are available at https://github.com/minnesotanlp/threads-of-subtlety.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes