CLMar 24, 2025

MASRAD: Arabic Terminology Management Corpora with Semi-Automatic Construction

arXiv:2503.19211v2h-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses the need for consistent terminology in academic translations and specialized Arabic documents, though it is incremental as it builds on existing extraction and similarity techniques.

The paper tackles the problem of managing Arabic terminology by presenting MASRAD, a dataset of foreign-Arabic term pairs extracted from specialized books, and a semi-automatic construction method called MASRAD-Ex, which achieved 90.5% precision and 92.4% recall.

This paper presents MASRAD, a terminology dataset for Arabic terminology management, and a method with supporting tools for its semi-automatic construction. The entries in MASRAD are $(f,a)$ pairs of foreign (non-Arabic) terms $f$, appearing in specialized, academic and field-specific books next to their Arabic $a$ counterparts. MASRAD-Ex systematically extracts these pairs as a first step to construct MASRAD. MASRAD helps improving term consistency in academic translations and specialized Arabic documents, and automating cross-lingual text processing. MASRAD-Ex leverages translated terms organically occurring in Arabic books, and considers several candidate pairs for each term phrase. The candidate Arabic terms occur next to the foreign terms, and vary in length. MASRAD-Ex computes lexicographic, phonetic, morphological, and semantic similarity metrics for each candidate pair, and uses heuristic, machine learning, and machine learning with post-processing approaches to decide on the best candidate. This paper presents MASRAD after thorough expert review and makes it available to the interested research community. The best performing MASRAD-Ex approach achieved 90.5% precision and 92.4% recall.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes