CLApr 3, 2025

Language Models reach higher Agreement than Humans in Historical Interpretation

arXiv:2504.02572v11 citationsh-index: 15
Originality Incremental advance
AI Analysis

This enables large-scale annotation and quantitative analysis in digital humanities, offering new educational and research opportunities to explore historical interpretations and bias.

The paper compared historical annotations by humans and Large Language Models, finding that both exhibit cultural bias but LLMs achieve higher consensus on interpreting historical facts from short texts, with humans disagreeing due to personal biases and LLMs due to skipping information or hallucinations.

This paper compares historical annotations by humans and Large Language Models. The findings reveal that both exhibit some cultural bias, but Large Language Models achieve a higher consensus on the interpretation of historical facts from short texts. While humans tend to disagree on the basis of their personal biases, Large Models disagree when they skip information or produce hallucinations. These findings have significant implications for digital humanities, enabling large-scale annotation and quantitative analysis of historical data. This offers new educational and research opportunities to explore historical interpretations from different Language Models, fostering critical thinking about bias.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes