CLNov 4, 2025

PragExTra: A Multilingual Corpus of Pragmatic Explicitation in Translation

Doreen Osmelak, Koel Dutta Chowdhury, Uliana Sentsova, Cristina España-Bonet, Josef van Genabith

arXiv:2511.02721v12.7h-index: 41

Originality Incremental advance

AI Analysis

This work addresses the challenge of making machine translation more culturally aware by providing a measurable, cross-linguistic dataset and detection method, representing an incremental step in computational translation studies.

The paper tackled the problem of modeling pragmatic explicitation in translation, a phenomenon where translators add cultural details, by introducing PragExTra, the first multilingual corpus and detection framework, achieving up to 0.88 accuracy and 0.82 F1 across languages with active learning improving classifier accuracy by 7-8 percentage points.

Translators often enrich texts with background details that make implicit cultural meanings explicit for new audiences. This phenomenon, known as pragmatic explicitation, has been widely discussed in translation theory but rarely modeled computationally. We introduce PragExTra, the first multilingual corpus and detection framework for pragmatic explicitation. The corpus covers eight language pairs from TED-Multi and Europarl and includes additions such as entity descriptions, measurement conversions, and translator remarks. We identify candidate explicitation cases through null alignments and refined using active learning with human annotation. Our results show that entity and system-level explicitations are most frequent, and that active learning improves classifier accuracy by 7-8 percentage points, achieving up to 0.88 accuracy and 0.82 F1 across languages. PragExTra establishes pragmatic explicitation as a measurable, cross-linguistic phenomenon and takes a step towards building culturally aware machine translation. Keywords: translation, multilingualism, explicitation

View on arXiv PDF

Similar