CLAug 15, 2025

Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions

arXiv:2508.11414v1h-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of steering LLMs to reflect human values without requiring large training datasets, though it appears incremental as it builds on existing fine-tuning methods.

The researchers tackled the problem of aligning large language models with human values by fine-tuning them on value survey questions, demonstrating that this approach can substantially shift the model's behavior in both in-domain survey responses and out-of-domain tasks like moral judgments and text-based games.

Large language models implicitly encode preferences over human values, yet steering them often requires large training data. In this work, we investigate a simple approach: Can we reliably modify a model's value system in downstream behavior by training it to answer value survey questions accordingly? We first construct value profiles of several open-source LLMs by asking them to rate a series of value-related descriptions spanning 20 distinct human values, which we use as a baseline for subsequent experiments. We then investigate whether the value system of a model can be governed by fine-tuning on the value surveys. We evaluate the effect of finetuning on the model's behavior in two ways; first, we assess how answers change on in-domain, held-out survey questions. Second, we evaluate whether the model's behavior changes in out-of-domain settings (situational scenarios). To this end, we construct a contextualized moral judgment dataset based on Reddit posts and evaluate changes in the model's behavior in text-based adventure games. We demonstrate that our simple approach can not only change the model's answers to in-domain survey questions, but also produces substantial shifts (value alignment) in implicit downstream task behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes