HCCLFeb 27, 2025

Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale

arXiv:2502.20140v24 citationsh-index: 2
AI Analysis

This addresses the problem of high costs and training needs for human interviewers in telephone surveys, offering a scalable solution for market research, social science, and public opinion studies, though it is incremental as it builds on existing conversational AI technologies.

The researchers tackled the resource-intensive nature of telephone surveys by developing an AI-driven system using LLMs, TTS, and STT, which successfully administered surveys at scale in a pilot study (n=75) in the U.S. and a large-scale deployment (n=2,739) in Peru, with data quality approaching human standards for structured items.

Telephone surveys remain a valuable tool for gathering insights but typically require substantial resources in training and coordinating human interviewers. This work presents an AI-driven telephone survey system integrating text-to-speech (TTS), a large language model (LLM), and speech-to-text (STT) that mimics the versatility of human-led interviews (full-duplex dialogues) at scale. We tested the system across two populations, a pilot study in the United States (n = 75) and a large-scale deployment in Peru (n = 2,739), inviting participants via web-based links and contacting them via direct phone calls. The AI agent successfully administered open-ended and closed-ended questions, handled basic clarifications, and dynamically navigated branching logic, allowing fast large-scale survey deployment without interviewer recruitment or training. Our findings demonstrate that while the AI system's probing for qualitative depth was more limited than human interviewers, overall data quality approached human-led standards for structured items. This study represents one of the first successful large-scale deployments of an LLM-based telephone interviewer in a real-world survey context. The AI-powered telephone survey system has the potential for expanding scalable, consistent data collecting across market research, social science, and public opinion studies, thus improving operational efficiency while maintaining appropriate data quality for research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes