Pavel Okopnyi

HCDec 2, 2025

In Silico Development of Psychometric Scales: Feasibility of Representative Population Data Simulation with LLMs

Enrico Cipriani, Pavel Okopnyi, Danilo Menicucci et al.

Developing and validating psychometric scales requires large samples, multiple testing phases, and substantial resources. Recent advances in Large Language Models (LLMs) enable the generation of synthetic participant data by prompting models to answer items while impersonating individuals of specific demographic profiles, potentially allowing in silico piloting before real data collection. Across four preregistered studies (N = circa 300 each), we tested whether LLM-simulated datasets can reproduce the latent structures and measurement properties of human responses. In Studies 1-2, we compared LLM-generated data with real datasets for two validated scales; in Studies 3-4, we created new scales using EFA on simulated data and then examined whether these structures generalized to newly collected human samples. Simulated datasets replicated the intended factor structures in three of four studies and showed consistent configural and metric invariance, with scalar invariance achieved for the two newly developed scales. However, correlation-based tests revealed substantial differences between real and synthetic datasets, and notable discrepancies appeared in score distributions and variances. Thus, while LLMs capture group-level latent structures, they do not approximate individual-level data properties. Simulated datasets also showed full internal invariance across gender. Overall, LLM-generated data appear useful for early-stage, group-level psychometric prototyping, but not as substitutes for individual-level validation. We discuss methodological limitations, risks of bias and data pollution, and ethical considerations related to in silico psychometric simulations.

HCJan 9, 2018

Between an Arena and a Sports Bar: Online Chats of eSports Spectators

Denis Bulygin, Ilya Musabirov, Alena Suvorova et al.

Hundreds of thousands of spectators use Twitch.tv to watch The International, a Dota 2 eSports tournament and communicate in massive chats. In this paper, we analyse crowd behavior in these chats, disentangle features of social communication, such as contextual meanings of emojis and short messages. We apply structural topic modelling and cross-correlation analysis to investigate topical and temporal patterns of chat messages and their relation to in-game events. We show that in-game events drive the communication in the massive chat and define its emergent topical structure to a various extent. Following the discussion in communication and social computing literature, we discuss these findings in the framework of analysis of communication of physical sports crowds and outline some limitations of the 'stadium' metaphor, suggesting a complementary metaphor of 'sports bar' as a useful analytical and design device.

Pavel Okopnyi

2 Papers