LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings
This addresses the need for scalable and realistic consumer research simulations for companies, though it is incremental as it builds on existing LLM methods.
The paper tackled the problem of unrealistic response distributions from LLMs in consumer research by introducing semantic similarity rating (SSR), which maps LLM textual responses to Likert distributions using embedding similarity, achieving 90% of human test-retest reliability and KS similarity > 0.85 on a dataset of 57 surveys with 9,300 human responses.
Consumer research costs companies billions annually yet suffers from panel biases and limited scale. Large language models (LLMs) offer an alternative by simulating synthetic consumers, but produce unrealistic response distributions when asked directly for numerical ratings. We present semantic similarity rating (SSR), a method that elicits textual responses from LLMs and maps these to Likert distributions using embedding similarity to reference statements. Testing on an extensive dataset comprising 57 personal care product surveys conducted by a leading corporation in that market (9,300 human responses), SSR achieves 90% of human test-retest reliability while maintaining realistic response distributions (KS similarity > 0.85). Additionally, these synthetic respondents provide rich qualitative feedback explaining their ratings. This framework enables scalable consumer research simulations while preserving traditional survey metrics and interpretability.