CL MEApr 29

A New Semisupervised Technique for Polarity Analysis using Masked Language Models

arXiv:2604.262306.7

AI Analysis

For researchers in text analysis and sentiment analysis, this provides a more reliable semisupervised technique for polarity analysis, though the improvement is incremental.

The paper introduces a probabilistic version of Latent Semantic Scaling (LSS) using word2vec as a masked language model, which assigns polarity scores as predicted probabilities of seed words. Applied to China Daily's COVID coverage, it shows improved accuracy, interpretability, and consistency over spatial models.

I developed a new version of Latent Semantic Scaling (LSS) employing word2vec as a masked language model. Unlike original spatial models, it assigns polarity scores to words and documents as predicted probabilities of seed words to occur in given contexts. These probabilistic polarity scores are more accurate, interpretable and consistent than those spatial polarity models can produce in text analysis. I demonstrate these advantages by applying both probabilistic and spatial models to China Daily's coverage of China and other countries during the coronavirus disease (COVID) pandemic in terms of achievement in health issues. The result suggests that more advanced masked language models would further improve the semisupervised machine learning technique.

View on arXiv PDF

Similar