CLLGSep 25, 2019

Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis

arXiv:1909.11241v130 citations
Originality Incremental advance
AI Analysis

This work addresses sentiment polarity detection for Spanish tweets, which is an incremental improvement in a domain-specific task.

The paper tackled sentiment analysis of Spanish tweets in the TASS 2019 shared task by combining bag-of-words, bag-of-characters, and robust subword-aware embeddings with data augmentation techniques, achieving highly competitive results.

In this article we describe our participation in TASS 2019, a shared task aimed at the detection of sentiment polarity of Spanish tweets. We combined different representations such as bag-of-words, bag-of-characters, and tweet embeddings. In particular, we trained robust subword-aware word embeddings and computed tweet representations using a weighted-averaging strategy. We also used two data augmentation techniques to deal with data scarcity: two-way translation augmentation, and instance crossover augmentation, a novel technique that generates new instances by combining halves of tweets. In experiments, we trained linear classifiers and ensemble models, obtaining highly competitive results despite the simplicity of our approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes