Assessing Emoji Use in Modern Text Processing Tools
This research identifies a gap in the ability of current NLP tools to effectively process emoji-containing text, which is a problem for anyone working with modern digital communication data.
This study investigates the performance of prominent NLP and text processing tools when handling text containing emojis, specifically focusing on tokenization, part-of-speech tagging, and sentiment analysis. The results indicate that many tools exhibit significant deficiencies when processing emoji-rich text.
Emojis have become ubiquitous in digital communication, due to their visual appeal as well as their ability to vividly convey human emotion, among other factors. The growing prominence of emojis in social media and other instant messaging also leads to an increased need for systems and tools to operate on text containing emojis. In this study, we assess this support by considering test sets of tweets with emojis, based on which we perform a series of experiments investigating the ability of prominent NLP and text processing tools to adequately process them. In particular, we consider tokenization, part-of-speech tagging, as well as sentiment analysis. Our findings show that many tools still have notable shortcomings when operating on text containing emojis.