CL AI HCSep 25, 2023

Art or Artifice? Large Language Models and the False Promise of Creativity

Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu

MicrosoftSalesforce

arXiv:2309.14556v322.7265 citationsh-index: 36

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of assessing creativity in AI-generated content for researchers and practitioners in natural language processing, though it is incremental in applying existing creativity frameworks to LLMs.

The paper tackled the problem of objectively evaluating the creativity of writing by large language models (LLMs) compared to professionals, proposing the Torrance Test of Creative Writing (TTCW) and finding that LLM-generated stories passed 3-10 times fewer tests than those by professionals, with LLMs failing to correlate with expert assessments as automated evaluators.

Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. We recruit 10 creative writers and implement a human assessment of 48 stories written either by professional authors or LLMs using TTCW. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals. In addition, we explore the use of LLMs as assessors to automate the TTCW evaluation, revealing that none of the LLMs positively correlate with the expert assessments.

View on arXiv PDF

Similar