The creative psychometric item generator: a framework for item generation and validation using large language models
This work addresses the need for automated creativity tests in modern economies, though it is incremental as it builds on existing psychometric and LLM-based methods.
The authors tackled the problem of generating valid creativity assessments for humans using large language models (LLMs), developing a framework called CPIG that iteratively creates prompts for creative problem-solving tasks, and found strong empirical evidence that it produces valid and reliable items.
Increasingly, large language models (LLMs) are being used to automate workplace processes requiring a high degree of creativity. While much prior work has examined the creativity of LLMs, there has been little research on whether they can generate valid creativity assessments for humans despite the increasingly central role of creativity in modern economies. We develop a psychometrically inspired framework for creating test items (questions) for a classic free-response creativity test: the creative problem-solving (CPS) task. Our framework, the creative psychometric item generator (CPIG), uses a mixture of LLM-based item generators and evaluators to iteratively develop new prompts for writing CPS items, such that items from later iterations will elicit more creative responses from test takers. We find strong empirical evidence that CPIG generates valid and reliable items and that this effect is not attributable to known biases in the evaluation process. Our findings have implications for employing LLMs to automatically generate valid and reliable creativity tests for humans and AI.