Ehsan Jahangirzadeh Soure

11.9HCMar 7, 2023

Collaboration with Conversational AI Assistants for UX Evaluation: Questions and How to Ask them (Voice vs. Text)

Emily Kuang, Ehsan Jahangirzadeh Soure, Mingming Fan et al.

AI is promising in assisting UX evaluators with analyzing usability tests, but its judgments are typically presented as non-interactive visualizations. Evaluators may have questions about test recordings, but have no way of asking them. Interactive conversational assistants provide a Q&A dynamic that may improve analysis efficiency and evaluator autonomy. To understand the full range of analysis-related questions, we conducted a Wizard-of-Oz design probe study with 20 participants who interacted with simulated AI assistants via text or voice. We found that participants asked for five categories of information: user actions, user mental model, help from the AI assistant, product and task information, and user demographics. Those who used the text assistant asked more questions, but the question lengths were similar. The text assistant was perceived as significantly more efficient, but both were rated equally in satisfaction and trust. We also provide design considerations for future conversational AI assistants for UX evaluation.

8.5HCMar 14

"It Became My Buddy, But I'm Not Afraid to Disagree": A Multi-Session Study of UX Evaluators Collaborating with Conversational AI Assistants

Emily Kuang, Ehsan Jahangirzadeh Soure, Luyao Shen et al.

AI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.

Ehsan Jahangirzadeh Soure

2 Papers