A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
It provides a comprehensive survey for researchers and practitioners to tackle misinformation and disinformation risks from AI-generated text, but it is incremental as it synthesizes existing work.
This paper reviews AI-generated text forensic systems, addressing the risks of LLM misuse by categorizing efforts into detection, attribution, and characterization, and discusses resources and future challenges in the field.
We have witnessed lately a rapid proliferation of advanced Large Language Models (LLMs) capable of generating high-quality text. While these LLMs have revolutionized text generation across various domains, they also pose significant risks to the information ecosystem, such as the potential for generating convincing propaganda, misinformation, and disinformation at scale. This paper offers a review of AI-generated text forensic systems, an emerging field addressing the challenges of LLM misuses. We present an overview of the existing efforts in AI-generated text forensics by introducing a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization. These pillars enable a practical understanding of AI-generated text, from identifying AI-generated content (detection), determining the specific AI model involved (attribution), and grouping the underlying intents of the text (characterization). Furthermore, we explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.