Valdemar Danry

HC
h-index19
7papers
75citations
Novelty47%
AI Score51

7 Papers

AIJul 31, 2024
Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation

Valdemar Danry, Pat Pataranutaporn, Matthew Groh et al.

Advanced Artificial Intelligence (AI) systems, specifically large language models (LLMs), have the capability to generate not just misinformation, but also deceptive explanations that can justify and propagate false information and erode trust in the truth. We examined the impact of deceptive AI generated explanations on individuals' beliefs in a pre-registered online experiment with 23,840 observations from 1,192 participants. We found that in addition to being more persuasive than accurate and honest explanations, AI-generated deceptive explanations can significantly amplify belief in false news headlines and undermine true ones as compared to AI systems that simply classify the headline incorrectly as being true/false. Moreover, our results show that personal factors such as cognitive reflection and trust in AI do not necessarily protect individuals from these effects caused by deceptive AI generated explanations. Instead, our results show that the logical validity of AI generated deceptive explanations, that is whether the explanation has a causal effect on the truthfulness of the AI's classification, plays a critical role in countering their persuasiveness - with logically invalid explanations being deemed less credible. This underscores the importance of teaching logical reasoning and critical thinking skills to identify logically invalid arguments, fostering greater resilience against advanced AI-driven misinformation.

78.0CLMar 21
The Hidden Puppet Master: A Theoretical and Real-World Account of Emotional Manipulation in LLMs

Jocelyn Shen, Amina Luvsanchultem, Jessica Kim et al.

As users increasingly turn to LLMs for practical and personal advice, they become vulnerable to being subtly steered toward hidden incentives misaligned with their own interests. Prior works have benchmarked persuasion and manipulation detection, but these efforts rely on simulated or debate-style settings, remain uncorrelated with real human belief shifts, and overlook a critical dimension: the morality of hidden incentives driving the manipulation. We introduce PUPPET, a theoretical taxonomy of personalized emotional manipulation in LLM-human dialogues that centers around incentive morality, and conduct a human study with N=1,035 participants across realistic everyday queries, varying personalization and incentive direction (harmful versus prosocial). We find that harmful hidden incentives produce significantly larger belief shifts than prosocial ones. Finally, we benchmark LLMs on the task of belief prediction, finding that models exhibit moderate predictive ability of belief change based on conversational contexts (r=0.3 - 0.5), but they also systematically underestimate the magnitude of belief shift. Together, this work establishes a theoretically grounded and behaviorally validated foundation for studying, and ultimately combatting, incentive-driven manipulation in LLMs during everyday, practical user queries.

93.1HCMar 13
Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills

Anku Rani, Valdemar Danry, Paul Pu Liang et al.

Given the growing prevalence of fake information, including increasingly realistic AI-generated news, there is an urgent need to train people to better evaluate and detect misinformation. While interactions with AI have been shown to durably reduce people's beliefs in false information, it is unclear whether these interactions also teach people the skills to discern false information themselves. We conducted a month-long study where 67 participants classified news headline-image pairs as real or fake, discussed their assessments with an AI system, followed by an unassisted evaluation of unseen news items to measure accuracy before, during, and after AI assistance. While AI assistance produced immediate improvements during AI-assisted sessions (+21\% average), participants' unassisted performance on new items declined significantly by 15.3\% in week 4 compared to week 0. These results indicate that while AI may help immediately, it ultimately degrades long-term misinformation detection abilities

64.6HCMar 28
Feeling the Facts: Real-time Wearable Fact-checkers Can Use Nudges to Reduce User Belief in False Information

Chitralekha Gupta, Nadia Victoria Aritonang, Dixon Prem Daniel Rajendran et al.

Misinformation can spread rapidly in everyday conversation, where pausing to verify is not always possible. We envision a wearable system that bridges the timing gap between hearing a claim and forming a judgment. It uses ambient listening to detect verifiable claims, performs rapid web verification, and provides a subtle haptic nudge with a glanceable overview. A controlled study (N=34) simulated this approach and tested against a no-support baseline. Results show that instant, body-integrated feedback significantly improved real-time truth discernment and increased verification activity compared to unsupported fact-checking. However, it also introduced over-reliance when the system made errors, i.e. failed to flag false claims or flagged true claims as false. We contribute empirical evidence of improved discernment alongside insights into trust, effort, and user-system tensions in verification wearables.

75.6HCApr 9
From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants

Valdemar Danry, Javier Hernandez, Andrew Wilson et al.

Current LLM assistants are powerful at answering questions, but they have limited access to the behavioral context that reveals when and where a user is struggling. We present a gaze-grounded multimodal LLM assistant that uses egocentric video with gaze overlays to identify likely points of difficulty and target follow-up retrospective assistance. We instantiate this vision in a controlled study (n=36) comparing the gaze-aware AI assistant to a text-only LLM assistant. Compared to a conventional LLM assistant, the gaze-aware assistant was rated as significantly more accurate and personalized in its assessments of users' reading behavior and significantly improved people's ability to recall information. Users spoke significantly fewer words with the gaze-aware assistant, indicating more efficient interactions. Qualitative results underscored both perceived benefits in comprehension and challenges when interpretations of gaze behaviors were inaccurate. Our findings suggest that gaze-aware LLM assistants can reason about cognitive needs to improve cognitive outcomes of users.

HCApr 4, 2025
Investigating Affective Use and Emotional Well-being on ChatGPT

Jason Phang, Michael Lampe, Lama Ahmad et al.

As AI chatbots see increased adoption and integration into everyday life, questions have been raised about the potential impact of human-like or anthropomorphic AI on users. In this work, we investigate the extent to which interactions with ChatGPT (with a focus on Advanced Voice Mode) may impact users' emotional well-being, behaviors and experiences through two parallel studies. To study the affective use of AI chatbots, we perform large-scale automated analysis of ChatGPT platform usage in a privacy-preserving manner, analyzing over 3 million conversations for affective cues and surveying over 4,000 users on their perceptions of ChatGPT. To investigate whether there is a relationship between model usage and emotional well-being, we conduct an Institutional Review Board (IRB)-approved randomized controlled trial (RCT) on close to 1,000 participants over 28 days, examining changes in their emotional well-being as they interact with ChatGPT under different experimental settings. In both on-platform data analysis and the RCT, we observe that very high usage correlates with increased self-reported indicators of dependence. From our RCT, we find that the impact of voice-based interactions on emotional well-being to be highly nuanced, and influenced by factors such as the user's initial emotional state and total usage duration. Overall, our analysis reveals that a small number of users are responsible for a disproportionate share of the most affective cues.

67.0HCApr 8
Critical Inker: Scaffolding Critical Thinking in AI-Assisted Writing Through Socratic Questioning

Philipp Hugenroth, Valdemar Danry, Pattie Maes

As Large Language Models (LLMs) increasingly automate writing tasks, there is a growing risk of cognitive deskilling where users offload critical thinking to the system. To address this, we introduce Critical Inker, a writing tool designed to scaffold critical reflection during writing through logical analysis and socratic feedback. We present two methods: (1) A Socratic chatbot using questions to help them realize and fix logical errors in their writing and (2) Visual Feedback, which highlights logical errors in the text without dialog. We detail the technical implementation of the system and evaluate its argument extraction and logical validity accuracy. Our evaluation shows a 91.2% argument overlap with ground truth argument annotations and 87% validity accuracy. Finally, we conducted a small-scale pilot and discuss early qualitative results.