CLDec 4, 2020

FinnSentiment -- A Finnish Social Media Corpus for Sentiment Polarity Annotation

arXiv:2012.02613v11 citations
AI Analysis

This dataset addresses the lack of large-scale, sentiment-annotated social media data for Finnish, which is crucial for developing sentiment analysis models for the Finnish language.

The authors created a 27,000-sentence dataset of Finnish social media posts, annotated for sentiment polarity by three native speakers. They analyzed inter-annotator agreement and established two baselines to demonstrate the dataset's utility.

Sentiment analysis and opinion mining is an important task with obvious application areas in social media, e.g. when indicating hate speech and fake news. In our survey of previous work, we note that there is no large-scale social media data set with sentiment polarity annotations for Finnish. This publications aims to remedy this shortcoming by introducing a 27,000 sentence data set annotated independently with sentiment polarity by three native annotators. We had the same three annotators for the whole data set, which provides a unique opportunity for further studies of annotator behaviour over time. We analyse their inter-annotator agreement and provide two baselines to validate the usefulness of the data set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes