CLAIApr 8, 2024

PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?

arXiv:2404.05483v128 citationsh-index: 7SemEval
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of identifying AI-generated content for applications in security and content moderation, but it is incremental as it builds on existing methods.

The paper tackled the problem of detecting machine-generated texts in English by combining RoBERTa-base embeddings with diversity features and a resampled training set, achieving 12th place out of 124 in a competition and an accuracy of 0.91.

In this paper, we present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection", focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 12th from 124 in the ranking for Subtask A (monolingual track), and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes