CLOct 25, 2018

Learning Emotion from 100 Observations: Unexpected Robustness of Deep Learning under Strong Data Limitations

Sven Buechel, João Sedoc, H. Andrew Schwartz, Lyle Ungar

arXiv:1810.10949v331.0992 citations

Originality Incremental advance

AI Analysis

This challenges a common belief in NLP, potentially enabling emotion analysis in data-scarce scenarios like less-resourced languages, though it is incremental in demonstrating robustness under data limitations.

The paper tackles the problem of deep learning's perceived need for large datasets in emotion analysis by showing that neural models can outperform traditional methods with only 100 training points across three languages, achieving strong empirical results.

One of the major downsides of Deep Learning is its supposed need for vast amounts of training data. As such, these techniques appear ill-suited for NLP areas where annotated data is limited, such as less-resourced languages or emotion analysis, with its many nuanced and hard-to-acquire annotation formats. We conduct a questionnaire study indicating that indeed the vast majority of researchers in emotion analysis deems neural models inferior to traditional machine learning when training data is limited. In stark contrast to those survey results, we provide empirical evidence for English, Polish, and Portuguese that commonly used neural architectures can be trained on surprisingly few observations, outperforming $n$-gram based ridge regression on only 100 data points. Our analysis suggests that high-quality, pre-trained word embeddings are a main factor for achieving those results.

View on arXiv PDF

Similar