CLMay 27, 2025

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models

arXiv:2505.20645v26 citationsh-index: 8EMNLP
Originality Synthesis-oriented
AI Analysis

This addresses the need for systematic assessment of LLM steerability for real-world applications involving diverse cultural and ideological perspectives, though it is incremental as it focuses on benchmarking rather than proposing new methods.

The authors tackled the problem of evaluating the steerability of large language models (LLMs) to align with diverse community norms by introducing Steer-Bench, a benchmark using Reddit communities, and found that the best-performing models achieved only around 65% accuracy compared to human experts at 81%, with some models lagging by over 15 percentage points.

Steerability, or the ability of large language models (LLMs) to adapt outputs to align with diverse community-specific norms, perspectives, and communication styles, is critical for real-world applications but remains under-evaluated. We introduce Steer-Bench, a benchmark for assessing population-specific steering using contrasting Reddit communities. Covering 30 contrasting subreddit pairs across 19 domains, Steer-Bench includes over 10,000 instruction-response pairs and validated 5,500 multiple-choice question with corresponding silver labels to test alignment with diverse community norms. Our evaluation of 13 popular LLMs using Steer-Bench reveals that while human experts achieve an accuracy of 81% with silver labels, the best-performing models reach only around 65% accuracy depending on the domain and configuration. Some models lag behind human-level alignment by over 15 percentage points, highlighting significant gaps in community-sensitive steerability. Steer-Bench is a benchmark to systematically assess how effectively LLMs understand community-specific instructions, their resilience to adversarial steering attempts, and their ability to accurately represent diverse cultural and ideological perspectives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes