Priyanka Dey

CL
h-index7
6papers
290citations
Novelty43%
AI Score50

6 Papers

SIApr 12
Israel-Hamas War on X: A Case Study of Coordinated Campaigns and Information Integrity

Tuğrulcan Elmas, Filipi Nascimento Silva, Manita Pote et al.

Coordinated campaigns on social media play a critical role in shaping crisis information environments, particularly during the onset of conflicts when uncertainty is high and verified information is scarce. We study the interplay between coordinated campaigns and information integrity through a case study of the 2023 Israel-Hamas War on Twitter (X). We analyze 4.5~million tweets and employ established coordination detection methods to identify 11 coordinated groups involving 541 accounts. We characterize these groups through a multimodal analysis that includes topics, account amplification, toxicity, emotional tone, visual themes, and misleading claims. Our analysis reveal that coordinated campaigns rely predominantly on low-complexity tactics, such as retweet amplification and copy-paste diffusion, and promote distinct narratives consistent with a fragmented manipulation landscape, without centralized control. Widely amplified misleading claims concentrate within just three of the identified coordinated groups; the remaining groups primarily engage in advocacy, religious solidarity, or humanitarian mobilization. Claim-level integrity, toxicity, and emotional signals are mutually uncorrelated: no single behavioral signal is a reliable proxy for the others. Targeting the most prolific spreaders of misleading content for moderation would be effective in reducing such content. However, targeting prolific amplifiers in general would not achieve the same mitigation effect. These findings suggest that evaluating coordination structures jointly with their specific content footprints is needed to effectively prioritize moderation interventions.

CLFeb 3, 2023
Investigating Stylistic Profiles for the Task of Empathy Classification in Medical Narrative Essays

Priyanka Dey, Roxana Girju

One important aspect of language is how speakers generate utterances and texts to convey their intended meanings. In this paper, we bring various aspects of the Construction Grammar (CxG) and the Systemic Functional Grammar (SFG) theories in a deep learning computational framework to model empathic language. Our corpus consists of 440 essays written by premed students as narrated simulated patient-doctor interactions. We start with baseline classifiers (state-of-the-art recurrent neural networks and transformer models). Then, we enrich these models with a set of linguistic constructions proving the importance of this novel approach to the task of empathy classification for this dataset. Our results indicate the potential of such constructions to contribute to the overall empathy profile of first-person narrative essays.

SYMay 23
Explicit Ensemble Mean Synchronization for Time Scale Generation with Mixed Atomic Clock Ensembles

Priyanka Dey, Takahiro Kawaguchi, Yuichiro Yano et al.

In this paper, we consider a mixed ensemble containing a mixture of cesium-type and hydrogen maser-type atomic clocks. For the mixed ensemble, the conventional Kalman filtering algorithm has certain limitations due to divergence of the error covariance matrix. To overcome these limitations, we obtain a Kalman filtering algorithm based on observable canonical decomposition that does not have any diverging terms. We use the estimates from the transformed Kalman filter to propose a time scale generation algorithm called explicit ensemble mean synchronization algorithm for the mixed ensemble. In this algorithm, we synchronize the time deviation of each clock from the ideal clock behavior to the unobservable ensemble mean of the phases where the weighting can be decided by the user. By regulating the free-running dynamics associated with the unobservable state, through choosing an appropriate weight vector, the frequency stability of the generated time scale or the synchronized time shared by the clocks is optimized over shorter (resp. longer) intervals, as measured by Hadamard variance. An illustrative example is given to demonstrate the efficiency of our algorithm.

HCJan 31
Measuring Human Preferences in RLHF is a Social Science Problem

Bijean Ghafouri, Eun Cheol Choi, Priyanka Dey et al.

RLHF assumes that annotation responses reflect genuine human preferences. We argue this assumption warrants systematic examination, and that behavioral science offers frameworks that bring clarity to when it holds and when it breaks down. Behavioral scientists have documented for sixty years that people routinely produce responses without holding genuine opinions, construct preferences on the spot based on contextual cues, and interpret identical questions differently. These phenomena are pervasive for precisely the value-laden judgments that matter most for alignment, yet this literature has not yet been systematically integrated into ML practice. We argue that the ML community must treat measurement validity as logically prior to preference aggregation. Specifically, we contend that measuring human preferences in RLHF is a social science problem. We present a taxonomy distinguishing genuine preferences from non-attitudes, constructed preferences, and measurement artifacts, along with diagnostic approaches for detecting each. This framework has two important implications. First, it raises the question of whether current RLHF practice may be systematically modeling noise as signal and elicitation artifacts as human values. Second, it provides a path forward by suggesting diagnostic tools that can distinguish valid preferences from artifacts before they enter the training pipeline.

CLOct 13, 2025
GRAVITY: A Framework for Personalized Text Generation via Profile-Grounded Synthetic Preferences

Priyanka Dey, Daniele Rosa, Wenqing Zheng et al.

Personalization in LLMs often relies on costly human feedback or interaction logs, limiting scalability and neglecting deeper user attributes. To reduce the reliance on human annotations, we introduce GRAVITY (Generative Response with Aligned Values, Interests, and Traits of You), a framework for generating synthetic, profile-grounded preference data that captures users' interests, values, beliefs, and personality traits. By integrating demographic, cultural, and psychological frameworks -- including Hofstede's cultural dimensions, Schwartz's basic values, the World Values Survey, and Big Five OCEAN traits -- GRAVITY synthesizes preference pairs to guide personalized content generation. We evaluate GRAVITY on book descriptions for 400 Amazon users, comparing it to prompt-based conditioning, standard fine-tuning, and naive synthetic pair generation. Profile-grounded synthetic data consistently improves generation, especially across multiple cultures (USA, Brazil, Japan, India), achieving over 4% higher preference gains across baselines, with user studies showing that GRAVITY outputs are preferred over 86% of the time. Our results show that scenario-grounded synthetic data can capture richer user variation, reduce reliance on costly annotation, and produce more engaging, user-centered content, offering a scalable path for LLM personalization.

CLApr 1, 2025
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Cultural Intelligence with CQ-Bench

Ziyi Liu, Priyanka Dey, Jen-tse Huang et al.

Cultural Intelligence (CQ) refers to the ability to understand unfamiliar cultural contexts, a crucial skill for large language models (LLMs) to effectively engage with globally diverse users. Existing studies often focus on explicitly stated cultural norms, but fail to capture the subtle, implicit values that are common in daily conversation. To address this gap, we introduce CQBench, a benchmark specifically designed to assess LLMs' capability to infer implicit cultural values from natural conversational contexts. CQBench consists of multi character conversation based stories using values from the World Value Survey and the GlobalOpinions, with topics including ethical, religious, social, etc. Our automatic dataset construction pipeline integrates rigorous validation procedures (incorporation, consistency, and implicitness checks), achieving a 94.5% human model agreement in the final validation. To leverage CQBench data, we design three tasks of increasing complexity: attitude detection, value selection, and value extraction. These tasks evaluate whether models can detect attitude and recognize values embedded within natural dialogues rather than relying on explicit cultural knowledge. We find that while frontier models like o1 reach human level performance in value selection (0.809 F1), they still fall short in nuanced attitude detection (0.622 F1). Notably, finetuning a smaller LLaMA-3.2-3B on only 500 culturally rich examples improves performance by over 10%, even outperforming o3-mini in some cases. Using CQ-Bench, we provide insights into the current challenges in LLMs' CQ research and suggest practical pathways for enhancing LLMs' cross-cultural reasoning abilities.