CLDec 17, 2019

To What Extent are Name Variants Used as Named Entities in Turkish Tweets?

arXiv:1912.07940v10.2Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of named entity recognition in social media texts for Turkish language processing, though it is incremental as it focuses on analysis and annotation rather than new methods.

The paper analyzed the prevalence of informal name variants, such as abbreviations and nicknames, in Turkish tweets compared to well-formed named entities, and provided publicly-available annotations for these categories.

Social media texts differ from regular texts in various aspects. One of the main differences is the common use of informal name variants instead of well-formed named entities in social media compared to regular texts. These name variants may come in the form of abbreviations, nicknames, contractions, and hypocoristic uses, in addition to names distorted due to capitalization and writing errors. In this paper, we present an analysis of the named entities in a publicly-available tweet dataset in Turkish with respect to their being name variants belonging to different categories. We also provide finer-grained annotations of the named entities as well-formed names and different categories of name variants, where these annotations are made publicly-available. The analysis presented and the accompanying annotations will contribute to related research on the treatment of named entities in social media.

View on arXiv PDF Code

Similar