CL AIOct 14, 2025

Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models

arXiv:2510.12110v113.06 citationsh-index: 1Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses the challenge of assessing LLM creativity efficiently for researchers and developers, though it is incremental as it builds on existing human creativity assessment methods.

The paper tackles the problem of evaluating creativity in large language models (LLMs) by proposing PACE, a metric that generates parallel association chains, which shows strong correlation with human rankings (Spearman's ρ=0.739) and reveals that high-performing LLMs match average human creativity but lag behind professionals.

The evaluation of LLMs' creativity represents a crucial research domain, though challenges such as data contamination and costly human assessments often impede progress. Drawing inspiration from human creativity assessment, we propose PACE, asking LLMs to generate Parallel Association Chains to Evaluate their creativity. PACE minimizes the risk of data contamination and offers a straightforward, highly efficient evaluation, as evidenced by its strong correlation with Chatbot Arena Creative Writing rankings (Spearman's $ρ= 0.739$, $p < 0.001$) across various proprietary and open-source models. A comparative analysis of associative creativity between LLMs and humans reveals that while high-performing LLMs achieve scores comparable to average human performance, professional humans consistently outperform LLMs. Furthermore, linguistic analysis reveals that both humans and LLMs exhibit a trend of decreasing concreteness in their associations, and humans demonstrating a greater diversity of associative patterns.

View on arXiv PDF

Similar