CLAIOct 17, 2024

Detecting AI-Generated Texts in Cross-Domains

arXiv:2410.13966v17 citationsh-index: 2DocEng
Originality Incremental advance
AI Analysis

This work addresses the challenge of cross-domain AI text detection for users needing reliable and economical detection tools, though it is incremental as it builds on existing methods.

The paper tackles the problem of detecting AI-generated texts across new domains by fine-tuning a RoBERTa-Ranker model with minimal labeled data, achieving better performance than DetectGPT and GPTZero on both in-domain and cross-domain texts.

Existing tools to detect text generated by a large language model (LLM) have met with certain success, but their performance can drop when dealing with texts in new domains. To tackle this issue, we train a ranking classifier called RoBERTa-Ranker, a modified version of RoBERTa, as a baseline model using a dataset we constructed that includes a wider variety of texts written by humans and generated by various LLMs. We then present a method to fine-tune RoBERTa-Ranker that requires only a small amount of labeled data in a new domain. Experiments show that this fine-tuned domain-aware model outperforms the popular DetectGPT and GPTZero on both in-domain and cross-domain texts, where AI-generated texts may either be in a different domain or generated by a different LLM not used to generate the training datasets. This approach makes it feasible and economical to build a single system to detect AI-generated texts across various domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes