CLJul 9, 2014

A Survey of Named Entity Recognition in Assamese and other Indian Languages

Gitimoni Talukdar, Pranjal Protim Borah, Arup Baruah

arXiv:1407.2918v19 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental survey paper addressing the problem of Named Entity Recognition for researchers working on low-resource Indian languages like Assamese.

This paper surveys Named Entity Recognition approaches for Indian languages, particularly Assamese, highlighting the challenges of resource scarcity and linguistic features like agglutination. It reviews existing rule-based and machine learning methods but does not present new experimental results or numerical improvements.

Named Entity Recognition is always important when dealing with major Natural Language Processing tasks such as information extraction, question-answering, machine translation, document summarization etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular reference to Assamese. There are various rule-based and machine learning approaches available for Named Entity Recognition. At the very first of the paper we give an idea of the available approaches for Named Entity Recognition and then we discuss about the related research in this field. Assamese like other Indian languages is agglutinative and suffers from lack of appropriate resources as Named Entity Recognition requires large data sets, gazetteer list, dictionary etc and some useful feature like capitalization as found in English cannot be found in Assamese. Apart from this we also describe some of the issues faced in Assamese while doing Named Entity Recognition.

View on arXiv PDF

Similar