CLOct 31, 2014

Experiments to Improve Named Entity Recognition on Turkish Tweets

arXiv:1410.8668v144 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of named entity recognition for Turkish social media texts, which is incremental as it adapts existing methods to a specific domain.

The paper tackled the problem of poor named entity recognition performance on Turkish tweets by adapting a baseline system with rule relaxations and lexical expansions, resulting in improved performance as evaluated on two annotated datasets.

Social media texts are significant information sources for several application areas including trend analysis, event monitoring, and opinion mining. Unfortunately, existing solutions for tasks such as named entity recognition that perform well on formal texts usually perform poorly when applied to social media texts. In this paper, we report on experiments that have the purpose of improving named entity recognition on Turkish tweets, using two different annotated data sets. In these experiments, starting with a baseline named entity recognition system, we adapt its recognition rules and resources to better fit Twitter language by relaxing its capitalization constraint and by diacritics-based expansion of its lexical resources, and we employ a simplistic normalization scheme on tweets to observe the effects of these on the overall named entity recognition performance on Turkish tweets. The evaluation results of the system with these different settings are provided with discussions of these results.

View on arXiv PDF

Similar