CLJun 5, 2019

Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

arXiv:1906.02358v263.773 citationsHas Code

Originality Synthesis-oriented

AI Analysis

It provides a foundational resource for researchers working on Sinhala NLP, an under-resourced language, but is incremental as it focuses on surveying existing work rather than introducing new methods.

This paper addresses the lack of coordination in Sinhala natural language processing by conducting a comprehensive survey of publicly available tools and research, aiming to help researchers better utilize existing contributions.

Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field.

View on arXiv PDF Code

Similar