Masakhane -- Machine Translation For Africa
It addresses the problem of underrepresentation of African languages in NLP for researchers and communities in Africa, though it is incremental in building on existing community-driven approaches.
The paper tackles the lack of resources and research for African languages in NLP by establishing MASAKHANE, an open-source, distributed community effort for machine translation, which has successfully spurred research and addressed key barriers like lack of benchmarks and community.
Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To begin to address the identified problems, MASAKHANE, an open-source, continent-wide, distributed, online research effort for machine translation for African languages, was founded. In this paper, we discuss our methodology for building the community and spurring research from the African continent, as well as outline the success of the community in terms of addressing the identified problems affecting African NLP.