CL LGOct 31, 2016

Towards Deep Learning in Hindi NER: An approach to tackle the Labelled Data Scarcity

Vinayak Athavale, Shreenivas Bharadwaj, Monik Pamecha, Ameya Prabhu, Manish Shrivastava

arXiv:1610.09756v24.245 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of limited labeled data for Hindi NER, which is important for natural language processing applications in Hindi-speaking regions, though it appears to be an incremental adaptation of existing neural methods to a new language.

The authors tackled the problem of labeled data scarcity for Hindi Named Entity Recognition by developing a language-independent neural model based on Bi-Directional RNN-LSTM with word vectors, achieving state-of-the-art performance in both English and Hindi without morphological analysis or gazetteers.

In this paper we describe an end to end Neural Model for Named Entity Recognition NER) which is based on Bi-Directional RNN-LSTM. Almost all NER systems for Hindi use Language Specific features and handcrafted rules with gazetteers. Our model is language independent and uses no domain specific features or any handcrafted rules. Our models rely on semantic information in the form of word vectors which are learnt by an unsupervised learning algorithm on an unannotated corpus. Our model attained state of the art performance in both English and Hindi without the use of any morphological analysis or without using gazetteers of any sort.

View on arXiv PDF Code

Similar