CL AI LGSep 20, 2023

Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains

Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso

arXiv:2309.11285v19.882 citationsh-index: 72Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of identifying AI-generated content for researchers and practitioners in natural language processing, but it is incremental as it builds on existing shared tasks.

The paper presents the AuTexTification shared task at IberLEF 2023, which tackled the problem of detecting and attributing machine-generated text across multiple domains and languages, with results including 114 participating teams and a dataset of over 160,000 texts.

This paper presents the overview of the AuTexTification shared task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2023 conference. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to attribute a machine-generated text to one of six different text generation models. Our AuTexTification 2023 dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles). A total of 114 teams signed up to participate, of which 36 sent 175 runs, and 20 of them sent their working notes. In this overview, we present the AuTexTification dataset and task, the submitted participating systems, and the results.

View on arXiv PDF Code

Similar