CLMar 10

CREATE: Testing LLMs for Associative Creativity

arXiv:2603.09970v137.44 citationsh-index: 7
Predicted impact top 15% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

This provides a benchmark for researchers to test and improve associative creativity in LLMs, though it is incremental as it focuses on evaluation rather than novel method development.

The authors tackled the problem of evaluating associative creativity in large language models by introducing the CREATE benchmark, which measures models' ability to generate diverse and specific paths between concepts, with results showing that frontier models achieve higher creative utility but saturation is difficult due to the task's complexity.

A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making benchmark saturation difficult to achieve. Furthermore, our results illustrate that thinking models are not always more effective on our task, even with high token budgets. Recent approaches for creative prompting give some but limited additional improvement. CREATE provides a sandbox for developing new methods to improve models' capacity for associative creativity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes