CLDec 12, 2015

A Hidden Markov Model Based System for Entity Extraction from Social Media English Text at FIRE 2015

arXiv:1512.03950v11.110 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for entity extraction in social media text, specifically for English in a shared task.

The paper tackles entity extraction from social media English text using a trigram Hidden Markov Model with gazetteer lists and word features, achieving the best performance in the FIRE 2015 task with precision of 61.96, recall of 39.46, and F-measure of 48.21.

This paper presents the experiments carried out by us at Jadavpur University as part of the participation in FIRE 2015 task: Entity Extraction from Social Media Text - Indian Languages (ESM-IL). The tool that we have developed for the task is based on Trigram Hidden Markov Model that utilizes information like gazetteer list, POS tag and some other word level features to enhance the observation probabilities of the known tokens as well as unknown tokens. We submitted runs for English only. A statistical HMM (Hidden Markov Models) based model has been used to implement our system. The system has been trained and tested on the datasets released for FIRE 2015 task: Entity Extraction from Social Media Text - Indian Languages (ESM-IL). Our system is the best performer for English language and it obtains precision, recall and F-measures of 61.96, 39.46 and 48.21 respectively.

View on arXiv PDF

Similar