CLAIMay 21, 2024

Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

arXiv:2405.12689v232 citationsh-index: 13Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the need for fine-grained detection of AI-generated content in applications like text refinement, though it is incremental as it builds on existing text-level detection methods.

The paper tackles the problem of detecting AI-paraphrased text spans within documents, proposing a framework called PTD that assigns paraphrasing scores to sentences, with results showing effectiveness in both in-distribution and out-of-distribution tests.

AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes