What Makes a Good Response? An Empirical Analysis of Quality in Qualitative Interviews
This work provides grounded metrics for qualitative researchers and NLP developers to improve interview design and automated system evaluation, though it is incremental in validating existing measures.
The study tackled the problem of validating measures of interview response quality by evaluating 10 proposed metrics against their actual contribution to study findings, finding that direct relevance to a key research question was the strongest predictor, while clarity and surprisal-based informativeness were not predictive.
Qualitative interviews provide essential insights into human experiences when they elicit high-quality responses. While qualitative and NLP researchers have proposed various measures of interview quality, these measures lack validation that high-scoring responses actually contribute to the study's goals. In this work, we identify, implement, and evaluate 10 proposed measures of interview response quality to determine which are actually predictive of a response's contribution to the study findings. To conduct our analysis, we introduce the Qualitative Interview Corpus, a newly constructed dataset of 343 interview transcripts with 16,940 participant responses from 14 real research projects. We find that direct relevance to a key research question is the strongest predictor of response quality. We additionally find that two measures commonly used to evaluate NLP interview systems, clarity and surprisal-based informativeness, are not predictive of response quality. Our work provides analytic insights and grounded, scalable metrics to inform the design of qualitative studies and the evaluation of automated interview systems.