Soojong Kim

h-index3

5papers

33citations

Novelty20%

AI Score31

Ranked #140,572 of 201,018 authors (top 70%)#24,861 in CL (top 77%)

5 Papers

CYApr 10, 2024

Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination

Soojong Kim, Kwanho Kim, Claire Wonjeong Jo

Objective. Vaccination has engendered a spectrum of public opinions, with social media acting as a crucial platform for health-related discussions. The emergence of artificial intelligence technologies, such as large language models (LLMs), offers a novel opportunity to efficiently investigate public discourses. This research assesses the accuracy of ChatGPT, a widely used and freely available service built upon an LLM, for sentiment analysis to discern different stances toward Human Papillomavirus (HPV) vaccination. Methods. Messages related to HPV vaccination were collected from social media supporting different message formats: Facebook (long format) and Twitter (short format). A selection of 1,000 human-evaluated messages was input into the LLM, which generated multiple response instances containing its classification results. Accuracy was measured for each message as the level of concurrence between human and machine decisions, ranging between 0 and 1. Results. Average accuracy was notably high when 20 response instances were used to determine the machine decision of each message: .882 (SE = .021) and .750 (SE = .029) for anti- and pro-vaccination long-form; .773 (SE = .027) and .723 (SE = .029) for anti- and pro-vaccination short-form, respectively. Using only three or even one instance did not lead to a severe decrease in accuracy. However, for long-form messages, the language model exhibited significantly lower accuracy in categorizing pro-vaccination messages than anti-vaccination ones. Conclusions. ChatGPT shows potential in analyzing public opinions on HPV vaccination using social media content. However, understanding the characteristics and limitations of a language model within specific public health contexts remains imperative.

HCOct 17, 2024

Perceptions of Discriminatory Decisions of Artificial Intelligence: Unpacking the Role of Individual Characteristics

Soojong Kim

This study investigates how personal differences (digital self-efficacy, technical knowledge, belief in equality, political ideology) and demographic factors (age, education, and income) are associated with perceptions of artificial intelligence (AI) outcomes exhibiting gender and racial bias and with general attitudes towards AI. Analyses of a large-scale experiment dataset (N = 1,206) indicate that digital self-efficacy and technical knowledge are positively associated with attitudes toward AI, while liberal ideologies are negatively associated with outcome trust, higher negative emotion, and greater skepticism. Furthermore, age and income are closely connected to cognitive gaps in understanding discriminatory AI outcomes. These findings highlight the importance of promoting digital literacy skills and enhancing digital self-efficacy to maintain trust in AI and beliefs in AI usefulness and safety. The findings also suggest that the disparities in understanding problematic AI outcomes may be aligned with economic inequalities and generational gaps in society. Overall, this study sheds light on the socio-technological system in which complex interactions occur between social hierarchies, divisions, and machines that reflect and exacerbate the disparities.

CLJan 31, 2025

Large Language Models' Accuracy in Emulating Human Experts' Evaluation of Public Sentiments about Heated Tobacco Products on Social Media

Kwanho Kim, Soojong Kim

Sentiment analysis of alternative tobacco products on social media is important for tobacco control research. Large Language Models (LLMs) can help streamline the labor-intensive human sentiment analysis process. This study examined the accuracy of LLMs in replicating human sentiment evaluation of social media messages about heated tobacco products (HTPs). The research used GPT-3.5 and GPT-4 Turbo to classify 500 Facebook and 500 Twitter messages, including anti-HTPs, pro-HTPs, and neutral messages. The models evaluated each message up to 20 times, and their majority label was compared to human evaluators. Results showed that GPT-3.5 accurately replicated human sentiment 61.2% of the time for Facebook messages and 57.0% for Twitter messages. GPT-4 Turbo performed better, with 81.7% accuracy for Facebook and 77.0% for Twitter. Using three response instances, GPT-4 Turbo achieved 99% of the accuracy of twenty instances. GPT-4 Turbo also had higher accuracy for anti- and pro-HTPs messages compared to neutral ones. Misclassifications by GPT-3.5 often involved anti- or pro-HTPs messages being labeled as neutral or irrelevant, while GPT-4 Turbo showed improvements across all categories. In conclusion, LLMs can be used for sentiment analysis of HTP-related social media messages, with GPT-4 Turbo reaching around 80% accuracy compared to human experts. However, there's a risk of misrepresenting overall sentiment due to differences in accuracy across sentiment categories.

CYApr 3

Can AI be a moral victim? The role of moral patiency and ownership perceptions in ethical judgments of using AI-generated content

Hyesun Choung, Soojong Kim

The growing use of generative AI raises ethical concerns about authorship and plagiarism. This study examines how people judge the reuse of AI-generated content, focusing on moral patiency and ownership perceptions. In an experiment, participants evaluated two substantively similar manuscripts in which the original source was described as authored by a human, an AI system, or an AI agent with a human-like name. Results showed that copying AI-generated work was judged less unethical, less plagiaristic, and less guilt-inducing than copying human-authored work. Mediation analyses revealed that this leniency stemmed from lower perceptions of AI's capacity to suffer harm (moral patiency) and greater ownership attributed to the human writer reusing AI-generated content. Anthropomorphic cues shaped moral evaluations indirectly by reducing perceived ownership. These findings shed light on how people morally disengage when using AI-generated work and highlight differences in how ethical judgments are applied to human versus AI-created content.

CLJul 6, 2025

Large Language Models' Varying Accuracy in Recognizing Risk-Promoting and Health-Supporting Sentiments in Public Health Discourse: The Cases of HPV Vaccination and Heated Tobacco Products

Soojong Kim, Kwanho Kim, Hye Min Kim

Machine learning methods are increasingly applied to analyze health-related public discourse based on large-scale data, but questions remain regarding their ability to accurately detect different types of health sentiments. Especially, Large Language Models (LLMs) have gained attention as a powerful technology, yet their accuracy and feasibility in capturing different opinions and perspectives on health issues are largely unexplored. Thus, this research examines how accurate the three prominent LLMs (GPT, Gemini, and LLAMA) are in detecting risk-promoting versus health-supporting sentiments across two critical public health topics: Human Papillomavirus (HPV) vaccination and heated tobacco products (HTPs). Drawing on data from Facebook and Twitter, we curated multiple sets of messages supporting or opposing recommended health behaviors, supplemented with human annotations as the gold standard for sentiment classification. The findings indicate that all three LLMs generally demonstrate substantial accuracy in classifying risk-promoting and health-supporting sentiments, although notable discrepancies emerge by platform, health issue, and model type. Specifically, models often show higher accuracy for risk-promoting sentiment on Facebook, whereas health-supporting messages on Twitter are more accurately detected. An additional analysis also shows the challenges LLMs face in reliably detecting neutral messages. These results highlight the importance of carefully selecting and validating language models for public health analyses, particularly given potential biases in training data that may lead LLMs to overestimate or underestimate the prevalence of certain perspectives.