CLMar 4, 2017

Lexical Resources for Hindi Marathi MT

arXiv:1703.01485v19 citations
Originality Synthesis-oriented
AI Analysis

This work addresses machine translation quality for Hindi-Marathi, a specific language pair, with incremental improvements using lexical resources.

The paper tackled improving statistical machine translation between Hindi and Marathi by augmenting training corpora with lexical resources like IndoWordnet, function words, and verb phrases, resulting in incremental quality improvements and better coverage in low-resource scenarios.

In this paper we describe some ways to utilize various lexical resources to improve the quality of statistical machine translation system. We have augmented the training corpus with various lexical resources such as IndoWordnet semantic relation set, function words, kridanta pairs and verb phrases etc. Our research on the usage of lexical resources mainly focused on two ways such as augmenting parallel corpus with more vocabulary and augmenting with various word forms. We have described case studies, evaluations and detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine translation systems. From the evaluations we observed that, there is an incremental growth in the quality of machine translation as the usage of various lexical resources increases. Moreover usage of various lexical resources helps to improve the coverage and quality of machine translation where limited parallel corpus is available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes