AI CL LGNov 30, 2024

Forma mentis networks predict creativity ratings of short texts via interpretable artificial intelligence in human and GPT-simulated raters

Edith Haim, Natalie Fischer, Salvatore Citraro, Giulio Rossetti, Massimo Stella

arXiv:2412.00530v14.23 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the challenge of AI alignment in creativity assessment for researchers and practitioners, highlighting incremental insights into GPT-3.5's biases.

The study tackled the problem of assessing creativity in short texts by comparing human and GPT-3.5 ratings using textual forma mentis networks and explainable AI, finding that GPT-3.5 ratings differ significantly from human ratings, with correlations and feature patterns showing key limitations in alignment.

Creativity is a fundamental skill of human cognition. We use textual forma mentis networks (TFMN) to extract network (semantic/syntactic associations) and emotional features from approximately one thousand human- and GPT3.5-generated stories. Using Explainable Artificial Intelligence (XAI), we test whether features relative to Mednick's associative theory of creativity can explain creativity ratings assigned by humans and GPT-3.5. Using XGBoost, we examine three scenarios: (i) human ratings of human stories, (ii) GPT-3.5 ratings of human stories, and (iii) GPT-3.5 ratings of GPT-generated stories. Our findings reveal that GPT-3.5 ratings differ significantly from human ratings not only in terms of correlations but also because of feature patterns identified with XAI methods. GPT-3.5 favours 'its own' stories and rates human stories differently from humans. Feature importance analysis with SHAP scores shows that: (i) network features are more predictive for human creativity ratings but also for GPT-3.5's ratings of human stories; (ii) emotional features played a greater role than semantic/syntactic network structure in GPT-3.5 rating its own stories. These quantitative results underscore key limitations in GPT-3.5's ability to align with human assessments of creativity. We emphasise the need for caution when using GPT-3.5 to assess and generate creative content, as it does not yet capture the nuanced complexity that characterises human creativity.

View on arXiv PDF

Similar