CLNov 3, 2023

Efficient Black-Box Adversarial Attacks on Neural Text Detectors

arXiv:2311.01873v121.0128 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses security vulnerabilities in AI-generated text detection, but it is incremental as it builds on existing adversarial attack methods.

The paper tackled the problem of fooling neural text detectors by using simple strategies like parameter tweaking and character-level mutations on GPT-3.5-generated text, resulting in effective misclassification without human suspicion.

Neural text detectors are models trained to detect whether a given text was generated by a language model or written by a human. In this paper, we investigate three simple and resource-efficient strategies (parameter tweaking, prompt engineering, and character-level mutations) to alter texts generated by GPT-3.5 that are unsuspicious or unnoticeable for humans but cause misclassification by neural text detectors. The results show that especially parameter tweaking and character-level mutations are effective strategies.

View on arXiv PDF Code

Similar