CLAILGJan 22, 2024

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

arXiv:2401.12070v3295 citationsh-index: 41ICML
Originality Highly original
AI Analysis

This addresses the challenge of identifying AI-generated content for applications like content moderation and academic integrity, offering a novel, training-free solution with high accuracy.

The paper tackles the problem of detecting machine-generated text from large language models (LLMs) by proposing a zero-shot detector called Binoculars, which uses a score based on contrasting two pre-trained LLMs and achieves over 90% detection accuracy for ChatGPT and other LLMs at a 0.01% false positive rate without training data.

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes