CLSep 26, 2022

Lex2Sent: A bagging approach to unsupervised sentiment analysis

Kai-Robin Lange, Jonas Rieger, Carsten Jentsch

arXiv:2209.13023v25.423 citationsh-index: 16Has Code

Originality Incremental advance

AI Analysis

This work addresses sentiment analysis for users needing efficient, hardware-light methods, but it is incremental as it builds on existing lexicon-based techniques.

The paper tackles the problem of unsupervised sentiment analysis by proposing Lex2Sent, a bagging approach that improves upon classic lexicon methods without requiring GPU hardware, achieving better performance than lexica and serving as a basis for high-performing few-shot fine-tuning.

Unsupervised text classification, with its most common form being sentiment analysis, used to be performed by counting words in a text that were stored in a lexicon, which assigns each word to one class or as a neutral word. In recent years, these lexicon-based methods fell out of favor and were replaced by computationally demanding fine-tuning techniques for encoder-only models such as BERT and zero-shot classification using decoder-only models such as GPT-4. In this paper, we propose an alternative approach: Lex2Sent, which provides improvement over classic lexicon methods but does not require any GPU or external hardware. To classify texts, we train embedding models to determine the distances between document embeddings and the embeddings of the parts of a suitable lexicon. We employ resampling, which results in a bagging effect, boosting the performance of the classification. We show that our model outperforms lexica and provides a basis for a high performing few-shot fine-tuning approach in the task of binary sentiment analysis.

View on arXiv PDF Code

Similar