CLJan 22

Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs

Tristan Williams, Franziska Weeber, Sebastian Padó, Alan Akbik

arXiv:2601.15755v20.6h-index: 4

Originality Incremental advance

AI Analysis

This addresses the issue of overestimating model alignment capabilities for researchers and practitioners in AI ethics and value alignment, though it is incremental as it builds on existing evaluation methods.

The paper tackles the problem of evaluating how well demographic-aligned large language models represent human opinions by proposing a framework that assesses multivariate correlation patterns beyond marginal distributions. The result shows that both persona prompting and demographic fine-tuning fail to fully capture gold standard correlation patterns from the World Values Survey, revealing structural failures in model representativeness.

Large language models are increasingly used to represent human opinions, values, or beliefs, and their steerability towards these ideals is an active area of research. Existing work focuses predominantly on aligning marginal response distributions, treating each survey item independently. While essential, this may overlook deeper latent structures that characterise real populations and underpin cultural values theories. We propose a framework for evaluating the representativeness of aligned models through multivariate correlation patterns in addition to marginal distributions. We show the value of our evaluation scheme by comparing two model steering techniques (persona prompting and demographic fine-tuning) and evaluating them against human responses from the World Values Survey. While the demographically fine-tuned model better approximates marginal response distributions than persona prompting, both techniques fail to fully capture the gold standard correlation patterns. We conclude that representativeness is a distinct aspect of value alignment and an evaluation focused on marginals can mask structural failures, leading to overly optimistic conclusions about model capabilities.

View on arXiv PDF

Similar