CLMar 22, 2018

MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification

arXiv:1803.08614v11094 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited data for researchers working on sentiment analysis in Basque and Catalan, though it is incremental as it focuses on creating new datasets rather than novel methods.

The authors tackled the lack of resources for sentiment analysis in under-resourced languages by introducing two datasets for Basque and Catalan hotel reviews with aspect-level annotations, providing benchmarks to support supervised approaches.

While sentiment analysis has become an established field in the NLP community, research into languages other than English has been hindered by the lack of resources. Although much research in multi-lingual and cross-lingual sentiment analysis has focused on unsupervised or semi-supervised approaches, these still require a large number of resources and do not reach the performance of supervised approaches. With this in mind, we introduce two datasets for supervised aspect-level sentiment analysis in Basque and Catalan, both of which are under-resourced languages. We provide high-quality annotations and benchmarks with the hope that they will be useful to the growing community of researchers working on these languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes