CLAIJul 16, 2024

Do LLMs have Consistent Values?

arXiv:2407.12878v315 citationsh-index: 28
Originality Incremental advance
AI Analysis

This addresses the problem of understanding and assessing value consistency in LLMs for AI safety and alignment research, though it is incremental in applying existing psychological frameworks.

The study investigated whether large language models (LLMs) exhibit consistent human-like values by analyzing their generated text using psychological value structures, finding that agreement with human data depends on prompting strategies, with 'Value Anchoring' yielding compelling results.

Large Language Models (LLM) technology is constantly improving towards human-like dialogue. Values are a basic driving force underlying human behavior, but little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as "Value Anchoring") the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes