CLApr 3, 2025

Language Models reach higher Agreement than Humans in Historical Interpretation

arXiv:2504.02572v14.91 citationsh-index: 15

Originality Incremental advance

AI Analysis

This enables large-scale annotation and quantitative analysis in digital humanities, offering new educational and research opportunities to explore historical interpretations and bias.

The paper compared historical annotations by humans and Large Language Models, finding that both exhibit cultural bias but LLMs achieve higher consensus on interpreting historical facts from short texts, with humans disagreeing due to personal biases and LLMs due to skipping information or hallucinations.

This paper compares historical annotations by humans and Large Language Models. The findings reveal that both exhibit some cultural bias, but Large Language Models achieve a higher consensus on the interpretation of historical facts from short texts. While humans tend to disagree on the basis of their personal biases, Large Models disagree when they skip information or produce hallucinations. These findings have significant implications for digital humanities, enabling large-scale annotation and quantitative analysis of historical data. This offers new educational and research opportunities to explore historical interpretations from different Language Models, fostering critical thinking about bias.

View on arXiv PDF

Similar