Eun Cheol Choi

CL
h-index67
8papers
95citations
Novelty42%
AI Score55

8 Papers

CLOct 13, 2023
Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against Misinformation

Eun Cheol Choi, Emilio Ferrara

In today's digital era, the rapid spread of misinformation poses threats to public well-being and societal trust. As online misinformation proliferates, manual verification by fact checkers becomes increasingly challenging. We introduce FACT-GPT (Fact-checking Augmentation with Claim matching Task-oriented Generative Pre-trained Transformer), a framework designed to automate the claim matching phase of fact-checking using Large Language Models (LLMs). This framework identifies new social media content that either supports or contradicts claims previously debunked by fact-checkers. Our approach employs GPT-4 to generate a labeled dataset consisting of simulated social media posts. This data set serves as a training ground for fine-tuning more specialized LLMs. We evaluated FACT-GPT on an extensive dataset of social media content related to public health. The results indicate that our fine-tuned LLMs rival the performance of larger pre-trained LLMs in claim matching tasks, aligning closely with human annotations. This study achieves three key milestones: it provides an automated framework for enhanced fact-checking; demonstrates the potential of LLMs to complement human expertise; offers public resources, including datasets and models, to further research and applications in the fact-checking domain.

CYMay 25
The Traffickers' Pitch: Detecting Deceptive Recruitment in Online Job Boards

Siyi Zhou, Peiran Qiu, Tanishq Salkar et al.

While substantial efforts in anti-trafficking research and practice have focused on identifying and assisting victims after exploitation occurs, comparatively less attention has been paid to preventing victimization at the recruitment stage. Although some platforms offer preventive tools, such as background checks triggered by in-person meeting detection, these measures primarily protect potential victims rather than directly limiting traffickers' recruitment activities. In this paper, we propose a computational framework to identify human trafficking recruiters through their linguistic features and to characterize their online recruitment patterns. We introduce a network-driven labeling method to construct large-scale ground truth for trafficking-at-risk job advertisements. Our results reveal significant linguistic differences between safe and risky advertisements and demonstrate that language models and embedding representations behave distinctly across these linguistic spaces. Building on these insights, we propose a multi-model ensemble classifier to improve the detection of trafficking-at-risk job ads. Finally, we analyze the geographic, gender, industry, and contact-method preferences of trafficking recruiters, revealing systematic patterns in recruitment strategies.

CLMar 4, 2025Code
Limited Effectiveness of LLM-based Data Augmentation for COVID-19 Misinformation Stance Detection

Eun Cheol Choi, Ashwin Balasubramanian, Jinhu Qi et al.

Misinformation surrounding emerging outbreaks poses a serious societal threat, making robust countermeasures essential. One promising approach is stance detection (SD), which identifies whether social media posts support or oppose misleading claims. In this work, we finetune classifiers on COVID-19 misinformation SD datasets consisting of claims and corresponding tweets. Specifically, we test controllable misinformation generation (CMG) using large language models (LLMs) as a method for data augmentation. While CMG demonstrates the potential for expanding training datasets, our experiments reveal that performance gains over traditional augmentation methods are often minimal and inconsistent, primarily due to built-in safeguards within LLMs. We release our code and datasets to facilitate further research on misinformation detection and generation.

SIApr 18
Mapping Election Toxicity on Social Media across Issue, Ideology, and Psychosocial Dimensions

Lei Cao, Wen Zeng, Xinyue Wu et al.

Online political hostility is pervasive, yet it remains unclear how toxicity varies across campaign issues and political ideology, and what psychosocial signals and framing accompany toxic expression online. In this work, we present a large-scale analysis of discourse on X (Twitter) during the five weeks surrounding the 2024 U.S. presidential election. We categorize posts into 10 major campaign issues, estimate the ideology of posts using a human-in-the-loop LLM-assisted annotation process, detect harmful content with an LLM-based toxicity detection model, and then examine the psychological drivers of toxic content. We use these annotated data to examine how harmful content varies across campaign issues and ideologies, as well as how emotional tone and moral framing shape toxicity in election discussions. Our results show issue heterogeneity in both the prevalence and intensity of toxicity. Identity-related issues displayed the highest toxicity intensity. As for specific harm categories, harassment was most prevalent and intense across most of the issues, while hate concentrated in identity-centered debates. Partisan posts contained more harmful content than neutral posts, and ideological asymmetries in toxicity varied by issue. In terms of psycholinguistic dimensions, we found that toxic discourse is dominated by high-arousal negative emotions. Left- and right-leaning posts often exhibit similar emotional profiles within the same issue domain, suggesting emotional mirroring. Partisan groups frequently rely on overlapping moral foundations, while issue context strongly shapes which moral foundations become most salient. These findings provide a fine-grained account of toxic political discourse on social media and highlight that online political toxicity is highly context-dependent, underscoring the need for issue-sensitive approaches to measuring and mitigating it.

HCJan 31
Measuring Human Preferences in RLHF is a Social Science Problem

Bijean Ghafouri, Eun Cheol Choi, Priyanka Dey et al.

RLHF assumes that annotation responses reflect genuine human preferences. We argue this assumption warrants systematic examination, and that behavioral science offers frameworks that bring clarity to when it holds and when it breaks down. Behavioral scientists have documented for sixty years that people routinely produce responses without holding genuine opinions, construct preferences on the spot based on contextual cues, and interpret identical questions differently. These phenomena are pervasive for precisely the value-laden judgments that matter most for alignment, yet this literature has not yet been systematically integrated into ML practice. We argue that the ML community must treat measurement validity as logically prior to preference aggregation. Specifically, we contend that measuring human preferences in RLHF is a social science problem. We present a taxonomy distinguishing genuine preferences from non-attitudes, constructed preferences, and measurement artifacts, along with diagnostic approaches for detecting each. This framework has two important implications. First, it raises the question of whether current RLHF practice may be systematically modeling noise as signal and elicitation artifacts as human values. Second, it provides a path forward by suggesting diagnostic tools that can distinguish valid preferences from artifacts before they enter the training pipeline.

SIFeb 4
Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Eun Cheol Choi, Lindsay E. Young, Emilio Ferrara

Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science, yet their ability to reproduce patterns of susceptibility to misinformation remains unclear. We test whether LLM-simulated survey respondents, prompted with participant profiles drawn from social survey data measuring network, demographic, attitudinal and behavioral features, can reproduce human patterns of misinformation belief and sharing. Using three online surveys as baselines, we evaluate whether LLM outputs match observed response distributions and recover feature-outcome associations present in the original survey data. LLM-generated responses capture broad distributional tendencies and show modest correlation with human responses, but consistently overstate the association between belief and sharing. Linear models fit to simulated responses exhibit substantially higher explained variance and place disproportionate weight on attitudinal and behavioral features, while largely ignoring personal network characteristics, relative to models fit to human responses. Analyses of model-generated reasoning and LLM training data suggest that these distortions reflect systematic biases in how misinformation-related concepts are represented. Our findings suggest that LLM-based survey simulations are better suited for diagnosing systematic divergences from human judgment than for substituting it.

CLFeb 8, 2024
FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs

Eun Cheol Choi, Emilio Ferrara

Our society is facing rampant misinformation harming public health and trust. To address the societal challenge, we introduce FACT-GPT, a system leveraging Large Language Models (LLMs) to automate the claim matching stage of fact-checking. FACT-GPT, trained on a synthetic dataset, identifies social media content that aligns with, contradicts, or is irrelevant to previously debunked claims. Our evaluation shows that our specialized LLMs can match the accuracy of larger models in identifying related claims, closely mirroring human judgment. This research provides an automated solution for efficient claim matching, demonstrates the potential of LLMs in supporting fact-checkers, and offers valuable resources for further research in the field.

CYFeb 4
Growth First, Care Second? Tracing the Landscape of LLM Value Preferences in Everyday Dilemmas

Zhiyi Chen, Eun Cheol Choi, Yingjia Luo et al.

People increasingly seek advice online from both human peers and large language model (LLM)-based chatbots. Such advice rarely involves identifying a single correct answer; instead, it typically requires navigating trade-offs among competing values. We aim to characterize how LLMs navigate value trade-offs across different advice-seeking contexts. First, we examine the value trade-off structure underlying advice seeking using a curated dataset from four advice-oriented subreddits. Using a bottom-up approach, we inductively construct a hierarchical value framework by aggregating fine-grained values extracted from individual advice options into higher-level value categories. We construct value co-occurrence networks to characterize how values co-occur within dilemmas and find substantial heterogeneity in value trade-off structures across advice-seeking contexts: a women-focused subreddit exhibits the highest network density, indicating more complex value conflicts; women's, men's, and friendship-related subreddits exhibit highly correlated value-conflict patterns centered on security-related tensions (security vs. respect/connection/commitment); by contrast, career advice forms a distinct structure where security frequently clashes with self-actualization and growth. We then evaluate LLM value preferences against these dilemmas and find that, across models and contexts, LLMs consistently prioritize values related to Exploration & Growth over Benevolence & Connection. This systemically skewed value orientation highlights a potential risk of value homogenization in AI-mediated advice, raising concerns about how such systems may shape decision-making and normative outcomes at scale.