Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis
This work addresses sentiment polarity detection for Spanish tweets, which is an incremental improvement in a domain-specific task.
The paper tackled sentiment analysis of Spanish tweets in the TASS 2019 shared task by combining bag-of-words, bag-of-characters, and robust subword-aware embeddings with data augmentation techniques, achieving highly competitive results.
In this article we describe our participation in TASS 2019, a shared task aimed at the detection of sentiment polarity of Spanish tweets. We combined different representations such as bag-of-words, bag-of-characters, and tweet embeddings. In particular, we trained robust subword-aware word embeddings and computed tweet representations using a weighted-averaging strategy. We also used two data augmentation techniques to deal with data scarcity: two-way translation augmentation, and instance crossover augmentation, a novel technique that generates new instances by combining halves of tweets. In experiments, we trained linear classifiers and ensemble models, obtaining highly competitive results despite the simplicity of our approaches.