Bijean Ghafouri

Semantic Scholar Profile

h-index18

4papers

9citations

Novelty64%

AI Score46

Ranked #63,684 of 201,326 authors (top 32%)#412 in HC (top 14%)

4 Papers

HCFeb 11

What do people want to fact-check?

Bijean Ghafouri, Dorsaf Sallami, Luca Luceri et al.

Research on misinformation has focused almost exclusively on supply, asking what falsehoods circulate, who produces them, and whether corrections work. A basic demand-side question remains unanswered. When ordinary people can fact-check anything they want, what do they actually ask about? We provide the first large-scale evidence on this question by analyzing close to 2{,}500 statements submitted by 457 participants to an open-ended AI fact-checking system. Each claim is classified along five semantic dimensions (domain, epistemic form, verifiability, target entity, and temporal reference), producing a behavioral map of public verification demand. Three findings stand out. First, users range widely across topics but default to a narrow epistemic repertoire, overwhelmingly submitting simple descriptive claims about present-day observables. Second, roughly one in four requests concerns statements that cannot be empirically resolved, including moral judgments, speculative predictions, and subjective evaluations, revealing a systematic mismatch between what users seek from fact-checking tools and what such tools can deliver. Third, comparison with the FEVER benchmark dataset exposes sharp structural divergences across all five dimensions, indicating that standard evaluation corpora encode a synthetic claim environment that does not resemble real-world verification needs. These results reframe fact-checking as a demand-driven problem and identify where current AI systems and benchmarks are misaligned with the uncertainty people actually experience.

HCJan 31

Measuring Human Preferences in RLHF is a Social Science Problem

Bijean Ghafouri, Eun Cheol Choi, Priyanka Dey et al.

RLHF assumes that annotation responses reflect genuine human preferences. We argue this assumption warrants systematic examination, and that behavioral science offers frameworks that bring clarity to when it holds and when it breaks down. Behavioral scientists have documented for sixty years that people routinely produce responses without holding genuine opinions, construct preferences on the spot based on contextual cues, and interpret identical questions differently. These phenomena are pervasive for precisely the value-laden judgments that matter most for alignment, yet this literature has not yet been systematically integrated into ML practice. We argue that the ML community must treat measurement validity as logically prior to preference aggregation. Specifically, we contend that measuring human preferences in RLHF is a social science problem. We present a taxonomy distinguishing genuine preferences from non-attitudes, constructed preferences, and measurement artifacts, along with diagnostic approaches for detecting each. This framework has two important implications. First, it raises the question of whether current RLHF practice may be systematically modeling noise as signal and elicitation artifacts as human values. Second, it provides a path forward by suggesting diagnostic tools that can distinguish valid preferences from artifacts before they enter the training pipeline.

CLNov 10, 2024

Epistemic Integrity in Large Language Models

Bijean Ghafouri, Shahrad Mohammadzadeh, James Zhou et al.

Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statements with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration $\unicode{x2013}$ where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models (LLMs) which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty LLMs hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing this miscalibration, offering a path towards correcting it and more trustworthy AI across domains.

HCAug 20, 2025

A Theory of Information, Variation, and Artificial Intelligence

Bijean Ghafouri

A growing body of empirical work suggests that the widespread adoption of generative AI produces a significant homogenizing effect on information, creativity, and cultural production. I first develop a novel theoretical framework to explain this phenomenon. I argue that a dynamic of AI-derivative epistemology, in which individuals increasingly defer to AI outputs, allows a centralized AI Prism to function, a technical mechanism whose architecture is designed to reduce variance and converge on the statistical mean. This provides a causal explanation for the generative monocultures observed in recent studies. However, I contend this represents only the first stage of a more complex and dialectical process. This paper's central and paradoxical thesis is that the very homogenization that flattens knowledge within specialized domains simultaneously renders that knowledge into consistent modules that can be recombined across them, a process foundational to innovation and creativity. However, this recombinant potential is not automatic, but rather conditional. This paper argues that these opposing forces, homogenizing defaults versus recombinant possibilities, are governed by the nature of human engagement with the technology. The ultimate effect of generative AI is conditional on whether individuals act as passive consumers deferring to the AI's statistical outputs, or as active curators who critically interrogate, re-contextualize, and recombine them. The paper concludes by outlining the cognitive and institutional scaffolds required to resolve this tension, arguing they are the decisive variable that determine whether generative AI becomes an instrument of innovation or homogenization.