CLJan 20

GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

Lotta Kiefer, Christoph Leiter, Sotaro Takeshita, Elena Schmidt, Steffen Eger

arXiv:2601.13711v10.6h-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses a gap in authorship verification research for non-English languages, providing a new benchmark for German AV, though it is incremental as it applies existing methods to a new domain.

The authors tackled the scarcity of large-scale benchmarks for German authorship verification by introducing GerAV, a comprehensive dataset with over 600k labeled text pairs, and found that a fine-tuned large language model outperformed recent baselines by up to 0.09 F1 score and surpassed GPT-5 in zero-shot settings by 0.08.

Authorship verification (AV) is the task of determining whether two texts were written by the same author and has been studied extensively, predominantly for English data. In contrast, large-scale benchmarks and systematic evaluations for other languages remain scarce. We address this gap by introducing GerAV, a comprehensive benchmark for German AV comprising over 600k labeled text pairs. GerAV is built from Twitter and Reddit data, with the Reddit part further divided into in-domain and cross-domain message-based subsets, as well as a profile-based subset. This design enables controlled analysis of the effects of data source, topical domain, and text length. Using the provided training splits, we conduct a systematic evaluation of strong baselines and state-of-the-art models and find that our best approach, a fine-tuned large language model, outperforms recent baselines by up to 0.09 absolute F1 score and surpasses GPT-5 in a zero-shot setting by 0.08. We further observe a trade-off between specialization and generalization: models trained on specific data types perform best under matching conditions but generalize less well across data regimes, a limitation that can be mitigated by combining training sources. Overall, GerAV provides a challenging and versatile benchmark for advancing research on German and cross-domain AV.

View on arXiv PDF

Similar