CLAIApr 22, 2025

Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach

arXiv:2504.15784v120 citationsh-index: 14EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of costly and misaligned creativity evaluation for LLM users in creative domains, though it is incremental as it builds on existing tests.

The paper tackles the challenge of evaluating creativity in texts generated by Large Language Models by proposing an automated reference-based method using the Torrance Test of Creative Writing, which improves alignment with human assessments by achieving a pairwise accuracy of 0.75 (a 15% increase).

Creative writing is a key capability of Large Language Models (LLMs), with potential applications in literature, storytelling, and various creative domains. However, evaluating the creativity of machine-generated texts remains a significant challenge, as existing methods either rely on costly manual annotations or fail to align closely with human assessments. In this paper, we propose an effective automated evaluation method based on the Torrance Test of Creative Writing (TTCW), which evaluates creativity as product. Our method employs a reference-based Likert-style approach, scoring generated creative texts relative to high-quality reference texts across various tests. Experimental results demonstrate that our method significantly improves the alignment between LLM evaluations and human assessments, achieving a pairwise accuracy of 0.75 (+15\%).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes