CLDLLGMar 11, 2025

Automating Violence Detection and Categorization from Ancient Texts

arXiv:2503.08192v111 citationsh-index: 1Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
Originality Synthesis-oriented
AI Analysis

It addresses the laborious manual data harvesting for violence research in humanities, particularly for historians analyzing societal dynamics, but is incremental as it applies existing LLM methods to a new domain.

This study tackled the problem of automating violence detection and categorization in ancient texts by evaluating large language models (LLMs), achieving an F1-score of up to 0.93 for detection and 0.86 for categorization.

Violence descriptions in literature offer valuable insights for a wide range of research in the humanities. For historians, depictions of violence are of special interest for analyzing the societal dynamics surrounding large wars and individual conflicts of influential people. Harvesting data for violence research manually is laborious and time-consuming. This study is the first one to evaluate the effectiveness of large language models (LLMs) in identifying violence in ancient texts and categorizing it across multiple dimensions. Our experiments identify LLMs as a valuable tool to scale up the accurate analysis of historical texts and show the effect of fine-tuning and data augmentation, yielding an F1-score of up to 0.93 for violence detection and 0.86 for fine-grained violence categorization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes