CLAIJun 28, 2024

Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

arXiv:2406.19874v233 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting AI-generated text as models become more human-like, with potential applications in content moderation and authenticity verification.

The study tackled the problem of distinguishing human from model-generated text by using relative likelihood values and spectral features, achieving competitive performance with previous zero-shot methods and setting a new state-of-the-art for short-text detection.

Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model's capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies. Our code is available at https://github.com/CLCS-SUSTech/FourierGPT

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes