Same Words, Different Judgments: Modality Effects on Preference Alignment

Aaron Broukhim, Nadir Weibel, Eshin Jolly

arXiv:2602.22710v12.2h-index: 28

Originality Highly original

AI Analysis

This study addresses the underexplored application of preference-based reinforcement learning to speech, highlighting critical differences in human judgment across modalities that affect AI alignment for developers.

This paper explores the impact of modality (text vs. audio) on human preference judgments for AI alignment. It finds that while audio preferences are as reliable as text (ICC(2,k) ≈ 0.80 at ∼9 raters), the modality significantly alters judgment criteria, leading to near-chance cross-modality agreement.

Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences, but its application to speech remains underexplored. We present a controlled cross-modal study of human and synthetic preference annotations, comparing text and audio evaluations of identical semantic content across 100 prompts. Audio preferences prove as reliable as text, with inter-rater agreement reaching good levels (ICC(2,k) $\approx$ .80) at $\sim$9 raters -- the first ICC-based reliability characterization in the preference annotation literature for either modality. However, modality reshapes how people judge: audio raters exhibit narrower decision thresholds, reduced length bias, and more user-oriented evaluation criteria, with near-chance cross-modality agreement. Synthetic ratings further align with human judgments and predict inter-rater agreement, supporting their use both for triaging ambiguous pairs and as full replacements for human annotations.

View on arXiv PDF

Similar