CLFeb 10

Targum -- A Multilingual New Testament Translation Corpus

Maciej Rapacz, Aleksander Smywiński-Pohl

arXiv:2602.09724v1h-index: 1

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for quantitative translation history research, enabling flexible multilevel analysis, though it is incremental as it builds on existing resources.

The authors tackled the lack of depth in existing biblical translation corpora by introducing a multilingual corpus of 657 New Testament translations, including 352 unique versions across five languages, with manual metadata annotation for canonicalization.

Many European languages possess rich biblical translation histories, yet existing corpora - in prioritizing linguistic breadth - often fail to capture this depth. To address this gap, we introduce a multilingual corpus of 657 New Testament translations, of which 352 are unique, with unprecedented depth in five languages: English (208 unique versions from 396 total), French (41 from 78), Italian (18 from 33), Polish (30 from 48), and Spanish (55 from 102). Aggregated from 12 online biblical libraries and one preexisting corpus, each translation is manually annotated with metadata that maps the text to a standardized identifier for the work, its specific edition, and its year of revision. This canonicalization empowers researchers to define "uniqueness" for their own needs: they can perform micro-level analyses on translation families, such as the KJV lineage, or conduct macro-level studies by deduplicating closely related texts. By providing the first resource designed for such flexible, multilevel analysis, our corpus establishes a new benchmark for the quantitative study of translation history.

View on arXiv PDF

Similar