CLMar 13, 2024

Can Large Language Models Identify Authorship?

arXiv:2403.08213v220.963 citationsh-index: 16Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses the need for verifying content authenticity and mitigating misinformation, offering a new benchmark for LLM-based authorship analysis, though it is incremental as it builds on existing text embedding methods.

The study tackled the problem of authorship identification by evaluating large language models (LLMs) in zero-shot, end-to-end authorship verification and attribution among multiple candidates, demonstrating their proficiency without domain-specific fine-tuning and providing explanations through linguistic features.

The ability to accurately identify authorship is crucial for verifying content authenticity and mitigating misinformation. Large Language Models (LLMs) have demonstrated an exceptional capacity for reasoning and problem-solving. However, their potential in authorship analysis remains under-explored. Traditional studies have depended on hand-crafted stylistic features, whereas state-of-the-art approaches leverage text embeddings from pre-trained language models. These methods, which typically require fine-tuning on labeled data, often suffer from performance degradation in cross-domain applications and provide limited explainability. This work seeks to address three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively? (2) Are LLMs capable of accurately attributing authorship among multiple candidates authors (e.g., 10 and 20)? (3) Can LLMs provide explainability in authorship analysis, particularly through the role of linguistic features? Moreover, we investigate the integration of explicit linguistic features to guide LLMs in their reasoning processes. Our assessment demonstrates LLMs' proficiency in both tasks without the need for domain-specific fine-tuning, providing explanations into their decision making via a detailed analysis of linguistic features. This establishes a new benchmark for future research on LLM-based authorship analysis.

View on arXiv PDF Code

Similar