Jasper Kyle Catapang

CYOct 11, 2021

Topic Modeling, Clade-assisted Sentiment Analysis, and Vaccine Brand Reputation Analysis of COVID-19 Vaccine-related Facebook Comments in the Philippines

Jasper Kyle Catapang, Jerome V. Cleofas

Vaccine hesitancy and other COVID-19-related concerns and complaints in the Philippines are evident on social media. It is important to identify these different topics and sentiments in order to gauge public opinion, use the insights to develop policies, and make necessary adjustments or actions to improve public image and reputation of the administering agency and the COVID-19 vaccines themselves. This paper proposes a semi-supervised machine learning pipeline to perform topic modeling, sentiment analysis, and an analysis of vaccine brand reputation to obtain an in-depth understanding of national public opinion of Filipinos on Facebook. The methodology makes use of a multilingual version of Bidirectional Encoder Representations from Transformers or BERT for topic modeling, hierarchical clustering, five different classifiers for sentiment analysis, and cosine similarity of BERT topic embeddings for vaccine brand reputation analysis. Results suggest that any type of COVID-19 misinformation is an emergent property of COVID-19 public opinion, and that the detection of COVID-19 misinformation can be an unsupervised task. Sentiment analysis aided by hierarchical clustering reveal that 21 of the 25 topics extrapolated by topic modeling are negative topics. Such negative comments spike in count whenever the Department of Health in the Philippines posts about the COVID-19 situation in other countries. Additionally, the high numbers of laugh reactions on the Facebook posts by the same agency -- without any humorous content -- suggest that the reactors of these posts tend to react the way they do, not because of what the posts are about but because of who posted them.

LGJun 11, 2019

k-Nearest Neighbor Optimization via Randomized Hyperstructure Convex Hull

Jasper Kyle Catapang

In the k-nearest neighbor algorithm (k-NN), the determination of classes for test instances is usually performed via a majority vote system, which may ignore the similarities among data. In this research, the researcher proposes an approach to fine-tune the selection of neighbors to be passed to the majority vote system through the construction of a random n-dimensional hyperstructure around the test instance by introducing a new threshold parameter. The accuracy of the proposed k-NN algorithm is 85.71%, while the accuracy of the conventional k-NN algorithm is 80.95% when performed on the Haberman's Cancer Survival dataset, and 94.44% for the proposed k-NN algorithm, compared to the conventional's 88.89% accuracy score on the Seeds dataset. The proposed k-NN algorithm is also on par with the conventional support vector machine algorithm accuracy, even on the Banknote Authentication and Iris datasets, even surpassing the accuracy of support vector machine on the Seeds dataset.

Jasper Kyle Catapang

2 Papers