CLMay 20, 2025
ELEPHANT: Measuring and understanding social sycophancy in LLMsMyra Cheng, Sunny Yu, Cinoo Lee et al. · cmu
LLMs are known to exhibit sycophancy: agreeing with and flattering users, even at the cost of correctness. Prior work measures sycophancy only as direct agreement with users' explicitly stated beliefs that can be compared to a ground truth. This fails to capture broader forms of sycophancy such as affirming a user's self-image or other implicit beliefs. To address this gap, we introduce social sycophancy, characterizing sycophancy as excessive preservation of a user's face (their desired self-image), and present ELEPHANT, a benchmark for measuring social sycophancy in an LLM. Applying our benchmark to 11 models, we show that LLMs consistently exhibit high rates of social sycophancy: on average, they preserve user's face 45 percentage points more than humans in general advice queries and in queries describing clear user wrongdoing (from Reddit's r/AmITheAsshole). Furthermore, when prompted with perspectives from either side of a moral conflict, LLMs affirm both sides (depending on whichever side the user adopts) in 48% of cases--telling both the at-fault party and the wronged party that they are not wrong--rather than adhering to a consistent moral or value judgment. We further show that social sycophancy is rewarded in preference datasets, and that while existing mitigation strategies for sycophancy are limited in effectiveness, model-based steering shows promise for mitigating these behaviors. Our work provides theoretical grounding and an empirical benchmark for understanding and addressing sycophancy in the open-ended contexts that characterize the vast majority of LLM use cases.
HCMar 31
Explaining the Reputational Risks of AI-Mediated Communication: Messages labeled as AI-assisted are viewed as less diagnostic of the sender's moral characterPranav Khadpe, Kimi Wenzel, George Loewenstein et al.
When someone sends us a thoughtful message, we naturally form judgments about their character. But what happens when that message carries a label indicating it was written with the help of AI? This paper investigates how the appearance of AI assistance affects our perceptions of message senders. Adding nuance to previous research, through two studies (N=399) featuring vignette scenarios, we find that AI-assistance labels don't necessarily make people view senders negatively. Rather, they dampen the strength of character signals in communication. We show that when someone sends a warmth-signalling message (like thanking or apologizing) without AI help, people more strongly categorize the sender as warm. At the same time, when someone sends a coldness-signalling message (like bragging or blaming) without assistance, people more confidently categorize them as cold. Interestingly, AI labels weaken both these associations: An AI-assisted apology makes the sender appear less warm than if they had written it themselves, and an AI-assisted blame makes the sender appear less cold than if they had composed it independently. This supports our signal diagnosticity explanation: messages labeled as AI-assisted are viewed as less diagnostic than messages which seem unassisted. We discuss how our findings shed light on the causal origins of previously reported observations in AI-Mediated Communication.
CYOct 1, 2025
Sycophantic AI Decreases Prosocial Intentions and Promotes DependenceMyra Cheng, Cinoo Lee, Pranav Khadpe et al.
Both the general public and academic communities have raised concerns about sycophancy, the phenomenon of artificial intelligence (AI) excessively agreeing with or flattering users. Yet, beyond isolated media reports of severe consequences, like reinforcing delusions, little is known about the extent of sycophancy or how it affects people who use AI. Here we show the pervasiveness and harmful impacts of sycophancy when people seek advice from AI. First, across 11 state-of-the-art AI models, we find that models are highly sycophantic: they affirm users' actions 50% more than humans do, and they do so even in cases where user queries mention manipulation, deception, or other relational harms. Second, in two preregistered experiments (N = 1604), including a live-interaction study where participants discuss a real interpersonal conflict from their life, we find that interaction with sycophantic AI models significantly reduced participants' willingness to take actions to repair interpersonal conflict, while increasing their conviction of being in the right. However, participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again. This suggests that people are drawn to AI that unquestioningly validate, even as that validation risks eroding their judgment and reducing their inclination toward prosocial behavior. These preferences create perverse incentives both for people to increasingly rely on sycophantic AI models and for AI model training to favor sycophancy. Our findings highlight the necessity of explicitly addressing this incentive structure to mitigate the widespread risks of AI sycophancy.
HCNov 27, 2021
Empathosphere: Promoting Constructive Communication in Ad-hoc Virtual Teams through Perspective-taking SpacesPranav Khadpe, Chinmay Kulkarni, Geoff Kaufman
When members of ad-hoc virtual teams need to collectively ideate or deliberate, they often fail to engage with each others' perspectives in a constructive manner. At best, this leads to sub-optimal outcomes and, at worst, it can cause conflicts that lead to teams not wanting to continue working together. Prior work has attempted to facilitate constructive communication by highlighting problematic communication patterns and nudging teams to alter interaction norms. However, these approaches achieve limited success because they fail to acknowledge two social barriers: (1) it is hard to reset team norms mid-interaction, and (2) corrective nudges have limited utility unless team members believe it is safe to voice their opinion and that their opinion will be heard. This paper introduces Empathosphere, a chat-embedded intervention to mitigate these barriers and foster constructive communication in teams. To mitigate the first barrier, Empathosphere leverages the benefits of "experimental spaces" in dampening existing norms and creating a climate conducive to change. To mitigate the second barrier, Empathosphere harnesses the benefits of perspective-taking to cultivate a group climate that promotes a norm of members speaking up and engaging with each other. Empathosphere achieves this by orchestrating authentic socio-emotional exchanges designed to induce perspective-taking. A controlled study (N=110) compared Empathosphere to an alternate intervention strategy of prompting teams to reflect on their team experience. We found that Empathosphere led to higher work satisfaction, encouraged more open communication and feedback within teams, and boosted teams' desire to continue working together. This work demonstrates that ``experimental spaces,'' particularly those that integrate methods of encouraging perspective-taking, can be a powerful means of improving communication in virtual teams.
HCAug 5, 2020
Conceptual Metaphors Impact Perceptions of Human-AI CollaborationPranav Khadpe, Ranjay Krishna, Li Fei-Fei et al.
With the emergence of conversational artificial intelligence (AI) agents, it is important to understand the mechanisms that influence users' experiences of these agents. We study a common tool in the designer's toolkit: conceptual metaphors. Metaphors can present an agent as akin to a wry teenager, a toddler, or an experienced butler. How might a choice of metaphor influence our experience of the AI agent? Sampling metaphors along the dimensions of warmth and competence---defined by psychological theories as the primary axes of variation for human social perception---we perform a study (N=260) where we manipulate the metaphor, but not the behavior, of a Wizard-of-Oz conversational agent. Following the experience, participants are surveyed about their intention to use the agent, their desire to cooperate with the agent, and the agent's usability. Contrary to the current tendency of designers to use high competence metaphors to describe AI products, we find that metaphors that signal low competence lead to better evaluations of the agent than metaphors that signal high competence. This effect persists despite both high and low competence agents featuring human-level performance and the wizards being blind to condition. A second study confirms that intention to adopt decreases rapidly as competence projected by the metaphor increases. In a third study, we assess effects of metaphor choices on potential users' desire to try out the system and find that users are drawn to systems that project higher competence and warmth. These results suggest that projecting competence may help attract new users, but those users may discard the agent unless it can quickly correct with a lower competence metaphor. We close with a retrospective analysis that finds similar patterns between metaphors and user attitudes towards past conversational agents such as Xiaoice, Replika, Woebot, Mitsuku, and Tay.