Beyond Words: Measuring User Experience through Speech Analysis in Voice User Interfaces
This work addresses the need for implicit, real-time UX measurement in voice user interfaces, offering an incremental improvement over traditional methods.
The study tackled the problem of evaluating user experience in voice assistants by analyzing speech features, finding correlations with self-reported satisfaction and achieving promising accuracy in classifying UX levels with a machine learning model.
Voice assistants (VAs) are typically evaluated through task performance metrics and self-report questionnaires, but people's voices themselves carry rich paralinguistic cues that reveal affect, effort, and interaction breakdowns. We present a within-subjects study (N=49) that systematically compared three VA personas across three usage scenarios to investigate whether speech-derived audio features can serve as a proxy for user experience (UX). Participants' speech was analyzed for temporal, spectral, and linguistic markers, alongside standardized UX measures, brief mood and stress ratings, and a post-study questionnaire. We found correlations between specific speech features and self-reported satisfaction and experience. Furthermore, a machine learning model trained on speech features achieved promising accuracy in classifying UX levels, indicating that this might be a reasonable alternative to self-report instruments. Our findings establish speech as a viable, real-time signal for implicitly measuring UX and point toward adaptive VUIs that respond dynamically to emotional and usability-related vocal cues.