Eduard Kuric

2papers

2 Papers

40.4HCMay 18
What Would GPT Click: Practical Effects of Human-AI Behavioral Misalignment and the Cost of Synthetic Participants in User Experience

Eduard Kuric, Peter Demcak, Matus Krajcovic

Synthetic participants represent a methodologically concerning concept that threatens the integrity of UX research. Findings from previous experiments specify how AI outputs are misaligned with the behaviors and thoughts of real humans in various ways. However, industry voices keep underestimating their severity, advocating for practical compromises where good-enough data does not need to be perfect, and all issues will be solved by future tuning. Our study tackles the lack of systematic understanding of the practical issues that arise with synthetic behavior and its use for steering decisions within real contexts. Within twelve diverse first click tests (n = 3431) obtained from real UX practice, we examine the ability of GPT to predict where humans click and how they reason about their behavior. Results (e.g., significantly different distribution from real data in 53% of tasks) demonstrate critical failures to reflect the patterns in which users click on visual elements and the underlying cognitive processes. Participant personas, chain-of-thought reasoning in GPT, and different sampling parameters fail to create sensible fidelity improvements apart from inflating believability. We expose a multitude of nuanced distortions in synthetic responses that reduce their overall analytical usefulness as a decision-making resource, compared with real data. Observed distortions can be theoretically linked to the properties categorically inherent to LLMs: their statistical nature and encoding of semantic heuristics dependent on their training on linguistic data.

42.4HCMay 18
Distorted Perspectives of LLM-Simulated Preferences: Can AI Mislead Design?

Eduard Kuric, Peter Demcak, Matus Krajcovic

Designers of digital solutions increasingly consult Large Language Models (LLMs) for their work. However, it remains unclear how this may affect the user experiences they produce and there are no established practices. We investigate how design preferences expressed by LLM-driven simulation methods align with those of real users. We present a study that aggregates real-world data and design stimuli from twenty-nine preference tests conducted in practice by users of the UXtweak online research platform (n = 2073). We perform holistic multimodal simulations where we manipulate LLM variables (model reasoning, sampling, persona type, and specificity) and assess their effects on algorithmic fidelity. Our results unveil significant and systematic discrepancies between peoples' real design preferences and LLM simulations that are consistent across manipulations. Synthetic justifications lack genuine depth, nuance and reasoning, which they substitute by patterns like focus on generic properties, specific elements, elaboration and overpraising. The unique attention directed by this research toward preferences within visual design stimuli highlights misrepresentation of perception and meaning by LLMs in a context that is intuitive yet critical for design teams. The external and ecological validity of our findings is high, given their replication across a multitude of real-world studies.