MotàMot project: conversion of a French-Khmer published dictionary for building a multilingual lexical system
This project addresses the need for digital language resources in developing and emerging countries like Cambodia, Laos, Vietnam, Malaysia, and Thailand, but it is incremental as it builds on existing dictionary data and tools.
The MotàMot project tackled the challenge of computerizing the under-resourced Khmer language by developing a multilingual lexical system, resulting in an online resource with data converted from a French-Khmer dictionary and made accessible via a REST API.
Economic issues related to the information processing techniques are very important. The development of such technologies is a major asset for developing countries like Cambodia and Laos, and emerging ones like Vietnam, Malaysia and Thailand. The MotAMot project aims to computerize an under-resourced language: Khmer, spoken mainly in Cambodia. The main goal of the project is the development of a multilingual lexical system targeted for Khmer. The macrostructure is a pivot one with each word sense of each language linked to a pivot axi. The microstructure comes from a simplification of the explanatory and combinatory dictionary. The lexical system has been initialized with data coming mainly from the conversion of the French-Khmer bilingual dictionary of Denis Richer from Word to XML format. The French part was completed with pronunciation and parts-of-speech coming from the FeM French-english-Malay dictionary. The Khmer headwords noted in IPA in the Richer dictionary were converted to Khmer writing with OpenFST, a finite state transducer tool. The resulting resource is available online for lookup, editing, download and remote programming via a REST API on a Jibiki platform.