CLSep 12, 2015

Kannada named entity recognition and classification (nerc) based on multinomial naïve bayes (mnb) classifier

arXiv:1509.04385v123 citations
Originality Synthesis-oriented
AI Analysis

This addresses NERC for Kannada, a challenging task in natural language processing, but it is incremental as it applies an existing method to a specific language.

The paper tackled Kannada named entity recognition and classification by developing a model based on a Multinomial Naïve Bayes classifier, achieving F1-measure of 81% on a test corpus of 5,000 tokens.

Named Entity Recognition and Classification (NERC) is a process of identification of proper nouns in the text and classification of those nouns into certain predefined categories like person name, location, organization, date, and time etc. NERC in Kannada is an essential and challenging task. The aim of this work is to develop a novel model for NERC, based on Multinomial Naïve Bayes (MNB) Classifier. The Methodology adopted in this paper is based on feature extraction of training corpus, by using term frequency, inverse document frequency and fitting them to a tf-idf-vectorizer. The paper discusses the various issues in developing the proposed model. The details of implementation and performance evaluation are discussed. The experiments are conducted on a training corpus of size 95,170 tokens and test corpus of 5,000 tokens. It is observed that the model works with Precision, Recall and F1-measure of 83%, 79% and 81% respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes