CY CLAug 3, 2020

Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages

Chris C. Emezue, Bonaventure F. P. Dossou

arXiv:2008.07302v15.19 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of fragmented research documentation for African languages, which is incremental as it builds on existing efforts to include these languages in NLP.

The paper tackles the challenge of tracking machine translation research for African languages by introducing Lanfrica, a participatory framework for documenting research, models, and datasets, aiming to improve accessibility and reproducibility.

Over the years, there have been campaigns to include the African languages in the growing research on machine translation (MT) in particular, and natural language processing (NLP) in general. Africa has the highest language diversity, with 1500-2000 documented languages and many more undocumented or extinct languages(Lewis, 2009; Bendor-Samuel, 2017). This makes it hard to keep track of the MT research, models and dataset that have been developed for some of them. As the internet and social media make up the daily lives of more than half of the world(Lin, 2020), as well as over 40% of Africans(Campbell, 2019), online platforms can be useful in creating accessibility to researches, benchmarks and datasets in these African languages, thereby improving reproducibility and sharing of existing research and their results. In this paper, we introduce Lanfrica, a novel, on-going framework that employs a participatory approach to documenting researches, projects, benchmarks and dataset on African languages.

View on arXiv PDF

Similar