CL AIJul 16, 2024

Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text

Seyedeh Fatemeh Ebrahimi, Karim Akhavan Azari, Amirmasoud Iravani, Arian Qazvini, Pouya Sadeghi, Zeinab Sadat Taghavi, Hossein Sameti

arXiv:2407.11774v114.629 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of identifying machine-generated text for NLP applications, but it is incremental as it applies an existing method to a specific competition task with moderate performance.

The paper tackled detecting machine-generated text in English as a binary classification task by fine-tuning a RoBERTa-base transformer, achieving 78.9% accuracy on the test dataset and ranking 57th in the SemEval-2024 competition.

Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a powerful neural architecture, to address MGT detection as a binary classification task. Focusing specifically on Subtask A (Monolingual-English) within the SemEval-2024 competition framework, our proposed system achieves an accuracy of 78.9% on the test dataset, positioning us at 57th among participants. Our study addresses this challenge while considering the limited hardware resources, resulting in a system that excels at identifying human-written texts but encounters challenges in accurately discerning MGTs.

View on arXiv PDF

Similar