LG AI CL IROct 15, 2024

Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning

arXiv:2410.11355v14.63 citationsh-index: 2NLPIR

Originality Synthesis-oriented

AI Analysis

This work addresses labeling cost reduction for sentiment analysis tasks, but it appears incremental as it applies existing semi-supervised techniques to a new domain.

The paper tackles the problem of high labeling costs in sentiment analysis by using a graph-based semi-supervised learning method with label propagation, which reduces the number of required labels compared to traditional methods.

Labeling datasets is a noteworthy challenge in machine learning, both in terms of cost and time. This research, however, leverages an efficient answer. By exploring label propagation in semi-supervised learning, we can significantly reduce the number of labels required compared to traditional methods. We employ a transductive label propagation method based on the manifold assumption for text classification. Our approach utilizes a graph-based method to generate pseudo-labels for unlabeled data for the text classification task, which are then used to train deep neural networks. By extending labels based on cosine proximity within a nearest neighbor graph from network embeddings, we combine unlabeled data into supervised learning, thereby reducing labeling costs. Based on previous successes in other domains, this study builds and evaluates this approach's effectiveness in sentiment analysis, presenting insights into semi-supervised learning.

View on arXiv PDF

Similar