CL AI HCSep 1, 2025

Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts

Shreyas Tirumala, Nishant Jain, Danny D. Leybzon, Trent D. Buskirk

arXiv:2509.01814v14.91 citationsh-index: 21

Originality Synthesis-oriented

AI Analysis

It addresses the problem of reliable data collection for researchers, but is incremental as it reviews existing evidence rather than proposing new methods.

This paper evaluates the fitness of AI voice interviewers for data collection in quantitative and qualitative research, finding that they exceed traditional IVR systems but face challenges like transcription errors and limited emotion detection, which may limit their utility in qualitative contexts.

Transformer-based Large Language Models (LLMs) have paved the way for "AI interviewers" that can administer voice-based surveys with respondents in real-time. This position paper reviews emerging evidence to understand when such AI interviewing systems are fit for purpose for collecting data within quantitative and qualitative research contexts. We evaluate the capabilities of AI interviewers as well as current Interactive Voice Response (IVR) systems across two dimensions: input/output performance (i.e., speech recognition, answer recording, emotion handling) and verbal reasoning (i.e., ability to probe, clarify, and handle branching logic). Field studies suggest that AI interviewers already exceed IVR capabilities for both quantitative and qualitative data collection, but real-time transcription error rates, limited emotion detection abilities, and uneven follow-up quality indicate that the utility, use and adoption of current AI interviewer technology may be context-dependent for qualitative data collection efforts.

View on arXiv PDF

Similar