Block-Segmentation Vectors for Arousal Prediction using Semi-supervised Learning
This work addresses a specific issue in sentiment analysis for applications handling emotional expressions, but it is incremental as it builds on existing SentiWordNet methods.
The paper tackles the problem of predicting arousal levels in SentiWordNet, where existing methods reduce accuracy, by proposing block-segmentation vectors using semi-supervised learning, and shows that this approach outperforms the previous method in evaluation experiments.
To handle emotional expressions in computer applications, Russell's circum- plex model has been useful for representing emotions according to valence and arousal. In SentiWordNet, the level of valence is automatically assigned to a large number of synsets (groups of synonyms in WordNet) using semi-supervised learning. However, when assigning the level of arousal, the existing method proposed for SentiWordNet reduces the accuracy of sentiment prediction. In this paper, we propose a block-segmentation vector for predicting the arousal levels of many synsets from a small number of labeled words using semi-supervised learning. We analyze the distribution of arousal and non-arousal words in a corpus of sentences by comparing it with the distribution of valence words. We address the problem that arousal level prediction fails when arousal and non-arousal words are mixed together in some sentences. To capture the features of such arousal and non-arousal words, we generate word vectors based on inverted indexes by block IDs, where the corpus is divided into blocks in the flow of sentences. In the evaluation experiment, we show that the results of arousal prediction with the block-segmentation vectors outperform the results of the previous method in SentiWordNet.