SDAIHCFeb 26

Same Words, Different Judgments: Modality Effects on Preference Alignment

arXiv:2602.22710v1h-index: 14
Originality Highly original
AI Analysis

This study addresses the underexplored application of preference-based reinforcement learning to speech, highlighting critical differences in human judgment across modalities that affect AI alignment for developers.

This paper explores the impact of modality (text vs. audio) on human preference judgments for AI alignment. It finds that while audio preferences are as reliable as text (ICC(2,k) ≈ 0.80 at ∼9 raters), the modality significantly alters judgment criteria, leading to near-chance cross-modality agreement.

Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences, but its application to speech remains underexplored. We present a controlled cross-modal study of human and synthetic preference annotations, comparing text and audio evaluations of identical semantic content across 100 prompts. Audio preferences prove as reliable as text, with inter-rater agreement reaching good levels (ICC(2,k) $\approx$ .80) at $\sim$9 raters -- the first ICC-based reliability characterization in the preference annotation literature for either modality. However, modality reshapes how people judge: audio raters exhibit narrower decision thresholds, reduced length bias, and more user-oriented evaluation criteria, with near-chance cross-modality agreement. Synthetic ratings further align with human judgments and predict inter-rater agreement, supporting their use both for triaging ambiguous pairs and as full replacements for human annotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes