AIDec 4, 2024

Large Language Models show both individual and collective creativity comparable to humans

Luning Sun, Yuzhuo Yuan, Yuan Yao, Yanyan Li, Hao Zhang, Xing Xie, Xiting Wang, Fang Luo, David Stillwell

Cambridge

arXiv:2412.03151v18.515 citationsh-index: 50Thinking Skills and Creativity

Originality Incremental advance

AI Analysis

This addresses the problem of assessing AI's potential impact on creative work by comparing LLMs to human creativity, though it is incremental in benchmarking existing models.

The study measured the creativity of Large Language Models (LLMs) across 13 tasks and found that the best models rank in the 52nd percentile against humans, excelling in divergent thinking and problem-solving but lagging in creative writing, with their collective creativity equivalent to 8-10 humans when questioned multiple times.

Artificial intelligence has, so far, largely automated routine tasks, but what does it mean for the future of work if Large Language Models (LLMs) show creativity comparable to humans? To measure the creativity of LLMs holistically, the current study uses 13 creative tasks spanning three domains. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. We find that the best LLMs (Claude and GPT-4) rank in the 52nd percentile against humans, and overall LLMs excel in divergent thinking and problem solving but lag in creative writing. When questioned 10 times, an LLM's collective creativity is equivalent to 8-10 humans. When more responses are requested, two additional responses of LLMs equal one extra human. Ultimately, LLMs, when optimally applied, may compete with a small group of humans in the future of work.

View on arXiv PDF

Similar