Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
This addresses the need for better evaluation of LLMs from a user perspective, though it is incremental in applying existing methods to gather user feedback.
The paper tackles the problem of understanding user opinions on large language models by introducing CLUE, an LLM-powered interviewer that conducts in-the-moment user experience interviews after interactions, and it captures insights such as bipolar views on reasoning processes and demands for information freshness and multi-modality from thousands of users.
Which large language model (LLM) is better? Every evaluation tells a story, but what do users really think about current LLMs? This paper presents CLUE, an LLM-powered interviewer that conducts in-the-moment user experience interviews, right after users interact with LLMs, and automatically gathers insights about user opinions from massive interview logs. We conduct a study with thousands of users to understand user opinions on mainstream LLMs, recruiting users to first chat with a target LLM and then be interviewed by CLUE. Our experiments demonstrate that CLUE captures interesting user opinions, e.g., the bipolar views on the displayed reasoning process of DeepSeek-R1 and demands for information freshness and multi-modality. Our code and data are at https://github.com/cxcscmu/LLM-Interviewer.