CL LGNov 1, 2024

Enhancing Authorship Attribution through Embedding Fusion: A Novel Approach with Masked and Encoder-Decoder Language Models

Arjun Ramesh Kaushik, Sunil Rufus R P, Nalini Ratha

arXiv:2411.00411v11.92 citationsh-index: 43ICPR

Originality Highly original

AI Analysis

This addresses the need for reliable discrimination methods in the face of increasing AI-generated content, representing a strong specific gain in authorship attribution.

The authors tackled the problem of distinguishing AI-generated from human-authored text by proposing an embedding fusion framework that integrates multiple language models, achieving over 96% accuracy and an MCC above 0.93 on a balanced dataset.

The increasing prevalence of AI-generated content alongside human-written text underscores the need for reliable discrimination methods. To address this challenge, we propose a novel framework with textual embeddings from Pre-trained Language Models (PLMs) to distinguish AI-generated and human-authored text. Our approach utilizes Embedding Fusion to integrate semantic information from multiple Language Models, harnessing their complementary strengths to enhance performance. Through extensive evaluation across publicly available diverse datasets, our proposed approach demonstrates strong performance, achieving classification accuracy greater than 96% and a Matthews Correlation Coefficient (MCC) greater than 0.93. This evaluation is conducted on a balanced dataset of texts generated from five well-known Large Language Models (LLMs), highlighting the effectiveness and robustness of our novel methodology.

View on arXiv PDF

Similar