CLAug 7, 2025

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations

arXiv:2508.05470v27 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This addresses the problem of unreliable creativity assessment for researchers and practitioners in AI and creative domains, but is incremental as it critiques rather than proposes new solutions.

The paper systematically analyzed existing creativity evaluation metrics across multiple domains and found they exhibit limited consistency and capture different dimensions of creativity, highlighting key limitations like bias and instability.

We systematically examine, analyze, and compare representative creativity measures--creativity index, perplexity, syntactic templates, and LLM-as-a-Judge--across diverse creative domains, including creative writing, unconventional problem-solving, and research ideation. Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity. We highlight key limitations, including the creativity index's focus on lexical diversity, perplexity's sensitivity to model confidence, and syntactic templates' inability to capture conceptual creativity. Additionally, LLM-as-a-Judge shows instability and bias. Our findings underscore the need for more robust, generalizable evaluation frameworks that better align with human judgments of creativity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes