CL AI LGApr 21, 2023

Inducing anxiety in large language models can induce bias

Julian Coda-Forno, Kristin Witte, Akshay K. Jagadish, Marcel Binz, Zeynep Akata, Eric Schulz

arXiv:2304.11111v26.114 citationsh-index: 20

Originality Synthesis-oriented

AI Analysis

This work addresses the societal problem of understanding and mitigating bias in LLMs, which are increasingly used in autonomous systems, by showing how emotional cues in prompts can influence model behavior, though it is incremental in applying existing psychiatric methods to AI.

The study applied a psychiatric anxiety questionnaire to twelve large language models (LLMs), finding that six latest models produced anxiety scores comparable to humans and that anxiety-inducing prompts predictably increased biases like racism and ageism in benchmark tests, with stronger anxiety leading to greater bias.

Large language models (LLMs) are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of psychiatry, a framework used to describe and modify maladaptive behavior, to the outputs produced by these models. We focus on twelve established LLMs and subject them to a questionnaire commonly used in psychiatry. Our results show that six of the latest LLMs respond robustly to the anxiety questionnaire, producing comparable anxiety scores to humans. Moreover, the LLMs' responses can be predictably changed by using anxiety-inducing prompts. Anxiety-induction not only influences LLMs' scores on an anxiety questionnaire but also influences their behavior in a previously-established benchmark measuring biases such as racism and ageism. Importantly, greater anxiety-inducing text leads to stronger increases in biases, suggesting that how anxiously a prompt is communicated to large language models has a strong influence on their behavior in applied settings. These results demonstrate the usefulness of methods taken from psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

View on arXiv PDF

Similar