From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories
For researchers applying sentiment analysis to sensitive historical narratives, this work provides a diagnostic framework to understand model divergence, but the results are incremental and domain-specific.
The study evaluates three off-the-shelf sentiment classifiers on a corpus of Holocaust oral histories (107,305 utterances, 579,013 sentences), finding low to moderate inter-model agreement driven by boundary decisions around neutrality. An ABC stability taxonomy is introduced to stratify disagreement.
Polarity detection becomes substantially more challenging under domain shift, particularly in heterogeneous, long-form narratives with complex discourse structure, such as Holocaust oral histories. This paper presents a corpus-scale diagnostic study of off-the-shelf sentiment classifiers on long-form Holocaust oral histories, using three pretrained transformer-based polarity classifiers on a corpus of 107,305 utterances and 579,013 sentences. After assembling model outputs, we introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability. We report pairwise percent agreement, Cohen kappa, Fleiss kappa, and row-normalized confusion matrices to localize systematic disagreement. As an auxiliary descriptive signal, a T5-based emotion classifier is applied to stratified samples from each agreement stratum to compare emotion distributions across strata. The combination of multi-model label triangulation and the ABC taxonomy provides a cautious, operational framework for characterizing where and how sentiment models diverge in sensitive historical narratives. Inter-model agreement is low to moderate overall and is driven primarily by boundary decisions around neutrality.