CLAICYHCOct 19, 2025

Who's Asking? Simulating Role-Based Questions for Conversational AI Evaluation

Georgia TechUW
arXiv:2510.16829v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the need for role-informed evaluation in conversational AI, particularly for stigmatized domains, though it is incremental as it builds on existing role theory and simulation methods.

The paper tackled the problem of conversational AI evaluations ignoring the asker's role, especially in stigmatized domains like opioid use disorder, by proposing CoRUS to simulate role-based questions, finding that vulnerable roles elicit more supportive responses (+17%) and reduced knowledge content (-19%) compared to practitioners.

Language model users often embed personal and social context in their questions. The asker's role -- implicit in how the question is framed -- creates specific needs for an appropriate response. However, most evaluations, while capturing the model's capability to respond, often ignore who is asking. This gap is especially critical in stigmatized domains such as opioid use disorder (OUD), where accounting for users' contexts is essential to provide accessible, stigma-free responses. We propose CoRUS (COmmunity-driven Roles for User-centric Question Simulation), a framework for simulating role-based questions. Drawing on role theory and posts from an online OUD recovery community (r/OpiatesRecovery), we first build a taxonomy of asker roles -- patients, caregivers, practitioners. Next, we use it to simulate 15,321 questions that embed each role's goals, behaviors, and experiences. Our evaluations show that these questions are both highly believable and comparable to real-world data. When used to evaluate five LLMs, for the same question but differing roles, we find systematic differences: vulnerable roles, such as patients and caregivers, elicit more supportive responses (+17%) and reduced knowledge content (-19%) in comparison to practitioners. Our work demonstrates how implicitly signaling a user's role shapes model responses, and provides a methodology for role-informed evaluation of conversational AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes