Intertextual Parallel Detection in Biblical Hebrew: A Transformer-Based Benchmark
This addresses the labor-intensive and error-prone task of intertextual analysis for biblical scholars, representing an incremental improvement using existing methods on new data.
This study tackled the problem of identifying parallel passages in biblical Hebrew, traditionally done manually, by evaluating pre-trained transformer models like E5 and AlephBERT, finding that E5 excels in parallel detection and AlephBERT in non-parallel differentiation.
Identifying parallel passages in biblical Hebrew (BH) is central to biblical scholarship for understanding intertextual relationships. Traditional methods rely on manual comparison, a labor-intensive process prone to human error. This study evaluates the potential of pre-trained transformer-based language models, including E5, AlephBERT, MPNet, and LaBSE, for detecting textual parallels in the Hebrew Bible. Focusing on known parallels between Samuel/Kings and Chronicles, I assessed each model's capability to generate word embeddings distinguishing parallel from non-parallel passages. Using cosine similarity and Wasserstein Distance measures, I found that E5 and AlephBERT show promise; E5 excels in parallel detection, while AlephBERT demonstrates stronger non-parallel differentiation. These findings indicate that pre-trained models can enhance the efficiency and accuracy of detecting intertextual parallels in ancient texts, suggesting broader applications for ancient language studies.