CLMay 22, 2014

Computerization of African languages-French dictionaries

arXiv:1405.5893v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited digital resources for African languages like Bambara and Hausa, but it is incremental as it focuses on format conversion rather than new linguistic analysis.

The paper tackled the lack of NLP tools for under-resourced African languages by converting five bilingual African language-French dictionaries from Word to XML/LMF format, making them available online on the Jibiki platform for lookup and modification.

This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes