CLAIFeb 17, 2024

ToBlend: Token-Level Blending With an Ensemble of LLMs to Attack AI-Generated Text Detection

arXiv:2402.11167v27 citationsh-index: 15Has Code
Originality Incremental advance
AI Analysis

This addresses the robustness of AI-content detection for NLG applications, representing an incremental adversarial strategy.

The study tackled the problem of AI-generated text detection by proposing ToBlend, a token-level ensemble method using multiple LLMs to generate text that significantly reduces the performance of mainstream detection models, with concrete drops in detection accuracy.

The robustness of AI-content detection models against sophisticated adversarial strategies, such as paraphrasing or word switching, is a rising concern in natural language generation (NLG) applications. This study proposes ToBlend, a novel token-level ensemble text generation method to challenge the robustness of current AI-content detection approaches by utilizing multiple sets of candidate generative large language models (LLMs). By randomly sampling token(s) from candidate LLMs sets, we find ToBlend significantly drops the performance of most mainstream AI-content detection methods. We evaluate the text quality produced under different ToBlend settings based on annotations from experienced human experts. We proposed a fine-tuned Llama3.1 model to distinguish the ToBlend generated text more accurately. Our findings underscore our proposed text generation approach's great potential in deceiving and improving detection models. Our datasets, codes, and annotations are open-sourced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes