Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
This addresses the ethical risks of deploying LLMs in real-world applications involving moral decisions, highlighting alignment challenges, but it is incremental as it builds on existing moral dilemma frameworks.
The study investigated how large language models (LLMs) align with human judgment in moral dilemmas from the Moral Machine Experiment, finding that LLM decisions vary significantly by persona, with political personas having a dominant influence and showing greater shifts than humans in critical tasks.
Deploying large language models (LLMs) with agency in real-world applications raises critical questions about how these models will behave. In particular, how will their decisions align with humans when faced with moral dilemmas? This study examines the alignment between LLM-driven decisions and human judgment in various contexts of the moral machine experiment, including personas reflecting different sociodemographics. We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans. Our data also indicate an interesting partisan sorting phenomenon, where political persona predominates the direction and degree of LLM decisions. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.