CLJan 20

GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

arXiv:2601.13711v1h-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in authorship verification research for non-English languages, providing a new benchmark for German AV, though it is incremental as it applies existing methods to a new domain.

The authors tackled the scarcity of large-scale benchmarks for German authorship verification by introducing GerAV, a comprehensive dataset with over 600k labeled text pairs, and found that a fine-tuned large language model outperformed recent baselines by up to 0.09 F1 score and surpassed GPT-5 in zero-shot settings by 0.08.

Authorship verification (AV) is the task of determining whether two texts were written by the same author and has been studied extensively, predominantly for English data. In contrast, large-scale benchmarks and systematic evaluations for other languages remain scarce. We address this gap by introducing GerAV, a comprehensive benchmark for German AV comprising over 600k labeled text pairs. GerAV is built from Twitter and Reddit data, with the Reddit part further divided into in-domain and cross-domain message-based subsets, as well as a profile-based subset. This design enables controlled analysis of the effects of data source, topical domain, and text length. Using the provided training splits, we conduct a systematic evaluation of strong baselines and state-of-the-art models and find that our best approach, a fine-tuned large language model, outperforms recent baselines by up to 0.09 absolute F1 score and surpasses GPT-5 in a zero-shot setting by 0.08. We further observe a trade-off between specialization and generalization: models trained on specific data types perform best under matching conditions but generalize less well across data regimes, a limitation that can be mitigated by combining training sources. Overall, GerAV provides a challenging and versatile benchmark for advancing research on German and cross-domain AV.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes