CLNov 22, 2022

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

arXiv:2211.12000v120 citationsh-index: 62
Originality Synthesis-oriented
AI Analysis

This provides a valuable resource for researchers studying code-switching phenomena and developing NLP systems for Egyptian Arabic-English, though it is incremental as an extension of existing data.

The researchers created ArzEn-ST, a three-way speech translation corpus for code-switched Egyptian Arabic-English, by extending an existing speech corpus with translations in both monolingual directions. They made the corpus publicly available and reported baseline results for machine translation and speech translation tasks.

We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes