Yara Kyrychenko

HC
h-index97
5papers
209citations
Novelty57%
AI Score46

5 Papers

CLOct 24, 2023
Generative Language Models Exhibit Social Identity Biases

Tiancheng Hu, Yara Kyrychenko, Steve Rathje et al.

The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases known from social psychology, are present in 56 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences (e.g., "We are..."). Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans, both in the lab and in real-world conversations with LLMs, and that curating training data and instruction fine-tuning can mitigate such biases. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.

HCFeb 26
Addressing Climate Action Misperceptions with Generative AI

Miriam Remshard, Yara Kyrychenko, Sander van der Linden et al.

Mitigating climate change requires behaviour change. However, even climate-concerned individuals often hold misperceptions about which actions most reduce carbon emissions. We recruited 1201 climate-concerned individuals to examine whether discussing climate actions with a large language model (LLM) equipped with climate knowledge and prompted to provide personalised responses would foster more accurate perceptions of the impacts of climate actions and increase willingness to adopt feasible, high-impact behaviours. We compared this to having participants run a web search, have a conversation with an unspecialised LLM, and no intervention. The personalised climate LLM was the only condition that led to increased knowledge about the impacts of climate actions and greater intentions to adopt impactful behaviours. While the personalised climate LLM did not outperform a web search in improving understanding of climate action impacts, the ability of LLMs to deliver personalised, actionable guidance may make them more effective at motivating impactful pro-climate behaviour change.

AIFeb 21, 2025
C3AI: Crafting and Evaluating Constitutions for Constitutional AI

Yara Kyrychenko, Ke Zhou, Edyta Bogucka et al.

Constitutional AI (CAI) guides LLM behavior using constitutions, but identifying which principles are most effective for model alignment remains an open challenge. We introduce the C3AI framework (\textit{Crafting Constitutions for CAI models}), which serves two key functions: (1) selecting and structuring principles to form effective constitutions before fine-tuning; and (2) evaluating whether fine-tuned CAI models follow these principles in practice. By analyzing principles from AI and psychology, we found that positively framed, behavior-based principles align more closely with human preferences than negatively framed or trait-based principles. In a safety alignment use case, we applied a graph-based principle selection method to refine an existing CAI constitution, improving safety measures while maintaining strong general reasoning capabilities. Interestingly, fine-tuned CAI models performed well on negatively framed principles but struggled with positively framed ones, in contrast to our human alignment results. This highlights a potential gap between principle design and model adherence. Overall, C3AI provides a structured and scalable approach to both crafting and evaluating CAI constitutions.

CYMay 18, 2025
How Malicious AI Swarms Can Threaten Democracy: The Fusion of Agentic AI and LLMs Marks a New Frontier in Information Warfare

Daniel Thilo Schroeder, Meeyoung Cha, Andrea Baronchelli et al.

Public opinion manipulation has entered a new phase, amplifying its roots in rhetoric and propaganda. Advances in large language models (LLMs) and autonomous agents now let influence campaigns reach unprecedented scale and precision. Researchers warn AI could foster mass manipulation. Generative tools can expand propaganda output without sacrificing credibility and inexpensively create election falsehoods that are rated as more human-like than those written by humans. Techniques meant to refine AI reasoning, such as chain-of-thought prompting, can just as effectively be used to generate more convincing falsehoods. Enabled by these capabilities, another disruptive threat is emerging: swarms of collaborative, malicious AI agents. Fusing LLM reasoning with multi-agent architectures, these systems are capable of coordinating autonomously, infiltrating communities, and fabricating consensus cheaply. By adaptively mimicking human social dynamics, they threaten democracy.

HCMar 5, 2025
Human Preferences for Constructive Interactions in Language Model Alignment

Yara Kyrychenko, Jon Roozenbeek, Brandon Davidson et al.

As large language models (LLMs) enter the mainstream, aligning them to foster constructive dialogue rather than exacerbate societal divisions is critical. Using an individualized and multicultural alignment dataset of over 7,500 conversations of individuals from 74 countries engaging with 21 LLMs, we examined how linguistic attributes linked to constructive interactions are reflected in human preference data used for training AI. We found that users consistently preferred well-reasoned and nuanced responses while rejecting those high in personal storytelling. However, users who believed that AI should reflect their values tended to place less preference on reasoning in LLM responses and more on curiosity. Encouragingly, we observed that users could set the tone for how constructive their conversation would be, as LLMs mirrored linguistic attributes, including toxicity, in user queries.