CL DL HCApr 11, 2023

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

Ulfeta A. Marovac, Aldina R. Avdić, Nikola Lj. Milošević

arXiv:2304.05468v13 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

It addresses the challenge of NLP for Serbian speakers and researchers, but is incremental as it reviews existing work rather than introducing new methods.

This paper surveys existing resources and methods for natural language processing of Serbian, a low-resourced and high-inflectional language, highlighting initiatives from the past three decades such as corpora development and models for tasks like classification and named entity recognition.

The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.

View on arXiv PDF

Similar