CLNov 3, 2023

Efficient Black-Box Adversarial Attacks on Neural Text Detectors

arXiv:2311.01873v1128 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in AI-generated text detection, but it is incremental as it builds on existing adversarial attack methods.

The paper tackled the problem of fooling neural text detectors by using simple strategies like parameter tweaking and character-level mutations on GPT-3.5-generated text, resulting in effective misclassification without human suspicion.

Neural text detectors are models trained to detect whether a given text was generated by a language model or written by a human. In this paper, we investigate three simple and resource-efficient strategies (parameter tweaking, prompt engineering, and character-level mutations) to alter texts generated by GPT-3.5 that are unsuspicious or unnoticeable for humans but cause misclassification by neural text detectors. The results show that especially parameter tweaking and character-level mutations are effective strategies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes