CL AIMay 6, 2024

Vietnamese AI Generated Text Detection

Quang-Dan Tran, Van-Quan Nguyen, Quang-Huy Pham, K. B. Thang Nguyen, Trong-Hop Do

arXiv:2405.03206v11.01 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of distinguishing AI-generated from human-written text in Vietnamese, which is incremental as it applies existing detection methods to a new language dataset.

The study tackled the problem of detecting AI-generated text in Vietnamese by creating the ViDetect dataset with 6,800 essay samples and evaluating state-of-the-art methods, achieving results that demonstrate their effectiveness in this language context.

In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we present a dataset named ViDetect, comprising 6.800 samples of Vietnamese essay, with 3.400 samples authored by humans and the remainder generated by LLMs, serving the purpose of detecting text generated by AI. We conducted evaluations using state-of-the-art methods, including ViT5, BartPho, PhoBERT, mDeberta V3, and mBERT. These results contribute not only to the growing body of research on detecting text generated by AI but also demonstrate the adaptability and effectiveness of different methods in the Vietnamese language context. This research lays the foundation for future advancements in AI-generated text detection and provides valuable insights for researchers in the field of natural language processing.

View on arXiv PDF

Similar