CLLGMLOct 20, 2018

Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings

arXiv:1810.08732v11088 citations
Originality Incremental advance
AI Analysis

This addresses the problem of extracting information from informal social media text for Turkish, with potential adaptability to other morphologically rich languages, though it is incremental as it builds on existing NER methods.

The authors tackled Named Entity Recognition (NER) on informal Turkish Twitter text using a semi-supervised learning approach with word embeddings and language-independent features, achieving better F-scores than previous systems on Turkish tweets.

Recently, due to the increasing popularity of social media, the necessity for extracting information from informal text types, such as microblog texts, has gained significant attention. In this study, we focused on the Named Entity Recognition (NER) problem on informal text types for Turkish. We utilized a semi-supervised learning approach based on neural networks. We applied a fast unsupervised method for learning continuous representations of words in vector space. We made use of these obtained word embeddings, together with language independent features that are engineered to work better on informal text types, for generating a Turkish NER system on microblog texts. We evaluated our Turkish NER system on Twitter messages and achieved better F-score performances than the published results of previously proposed NER systems on Turkish tweets. Since we did not employ any language dependent features, we believe that our method can be easily adapted to microblog texts in other morphologically rich languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes