AICLLGNov 30, 2024

Forma mentis networks predict creativity ratings of short texts via interpretable artificial intelligence in human and GPT-simulated raters

arXiv:2412.00530v13 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of AI alignment in creativity assessment for researchers and practitioners, highlighting incremental insights into GPT-3.5's biases.

The study tackled the problem of assessing creativity in short texts by comparing human and GPT-3.5 ratings using textual forma mentis networks and explainable AI, finding that GPT-3.5 ratings differ significantly from human ratings, with correlations and feature patterns showing key limitations in alignment.

Creativity is a fundamental skill of human cognition. We use textual forma mentis networks (TFMN) to extract network (semantic/syntactic associations) and emotional features from approximately one thousand human- and GPT3.5-generated stories. Using Explainable Artificial Intelligence (XAI), we test whether features relative to Mednick's associative theory of creativity can explain creativity ratings assigned by humans and GPT-3.5. Using XGBoost, we examine three scenarios: (i) human ratings of human stories, (ii) GPT-3.5 ratings of human stories, and (iii) GPT-3.5 ratings of GPT-generated stories. Our findings reveal that GPT-3.5 ratings differ significantly from human ratings not only in terms of correlations but also because of feature patterns identified with XAI methods. GPT-3.5 favours 'its own' stories and rates human stories differently from humans. Feature importance analysis with SHAP scores shows that: (i) network features are more predictive for human creativity ratings but also for GPT-3.5's ratings of human stories; (ii) emotional features played a greater role than semantic/syntactic network structure in GPT-3.5 rating its own stories. These quantitative results underscore key limitations in GPT-3.5's ability to align with human assessments of creativity. We emphasise the need for caution when using GPT-3.5 to assess and generate creative content, as it does not yet capture the nuanced complexity that characterises human creativity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes