CLJun 18, 2016

Egyptian Arabic to English Statistical Machine Translation System for NIST OpenMT'2015

arXiv:1606.05759v15 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of machine translation for dialectal Arabic, which is incremental as it applies existing SMT methods to a specific domain.

The paper tackled the problem of translating informal Egyptian Arabic to English for the NIST OpenMT'2015 competition, focusing on SMS, chat, and speech data, and their system achieved second place across all three genres.

The paper describes the Egyptian Arabic-to-English statistical machine translation (SMT) system that the QCRI-Columbia-NYUAD (QCN) group submitted to the NIST OpenMT'2015 competition. The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech. Thus, our efforts focused on processing and standardizing Arabic, e.g., using tools such as 3arrib and MADAMIRA. We further trained a phrase-based SMT system using state-of-the-art features and components such as operation sequence model, class-based language model, sparse features, neural network joint model, genre-based hierarchically-interpolated language model, unsupervised transliteration mining, phrase-table merging, and hypothesis combination. Our system ranked second on all three genres.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes