Paul H. P. Hanel

AI
h-index29
8papers
360citations
Novelty26%
AI Score38

8 Papers

AIMar 21, 2023
Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity

Jennifer Haase, Paul H. P. Hanel

A widespread view is that Artificial Intelligence cannot be creative. We tested this assumption by comparing human-generated ideas with those generated by six Generative Artificial Intelligence (GAI) chatbots: $alpa.\!ai$, $Copy.\!ai$, ChatGPT (versions 3 and 4), $Studio.\!ai$, and YouChat. Humans and a specifically trained AI independently assessed the quality and quantity of ideas. We found no qualitative difference between AI and human-generated creativity, although there are differences in how ideas are generated. Interestingly, 9.4 percent of humans were more creative than the most creative GAI, GPT-4. Our findings suggest that GAIs are valuable assistants in the creative process. Continued research and development of GAI in creative tasks is crucial to fully understand this technology's potential benefits and drawbacks in shaping the future of creativity. Finally, we discuss the question of whether GAIs are capable of being truly creative.

AIJan 29
Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks

Jennifer Haase, Jana Gonnermann-Müller, Paul H. P. Hanel et al.

How much of LLM output variance is explained by prompts versus model choice versus stochasticity through sampling? We answer this by evaluating 12 LLMs on 10 creativity prompts with 100 samples each (N = 12,000). For output quality (originality), prompts explain 36.43% of variance, comparable to model choice (40.94%). But for output quantity (fluency), model choice (51.25%) and within-LLM variance (33.70%) dominate, with prompts explaining only 4.22%. Prompts are powerful levers for steering output quality, but given the substantial within-LLM variance (10-34%), single-sample evaluations risk conflating sampling noise with genuine prompt or model effects.

61.0CYMar 28
From Influencers to Lecturers: Understanding Public Attitudes Toward Digital vs. Traditional Jobs

Paul H. P. Hanel, Gabriel Lins de Holanda Coelho, Jennifer Haase

The rapid expansion of high-speed internet has led to the emergence of new digital jobs, such as digital influencers, fitness models, and adult models who share content on subscription-based social media platforms. Across two experiments involving 1,002 participants, we combined theories from social psychology and information systems to investigate how digital jobs are perceived compared to matched established jobs, and predictors of attitudes toward those jobs (e.g., symbolic threat, contact, perceived usefulness). We found that individuals in digital professions were perceived as less favorably and less hard-working than those in matched established jobs. Digital jobs were also regarded as more threatening to societal values and less useful. The relation between job type and attitudes toward these jobs was partially mediated by contact with people working in these jobs, perceived usefulness, perception of hard work, and symbolic threat. These effects were consistent across both experiments, and various moderators: openness to new experiences, attitudes toward digitalization, political orientation, and age. Among the nine jobs examined, lecturers were perceived as most positive, while adult models were viewed as least positive. Overall, our findings demonstrate that integrating theories from social psychology and information systems can enhance our understanding of how attitudes are formed.

CLApr 10, 2025
Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability

Jennifer Haase, Paul H. P. Hanel, Sebastian Pokutta

Following the widespread adoption of ChatGPT in early 2023, numerous studies reported that large language models (LLMs) can match or even surpass human performance in creative tasks. However, it remains unclear whether LLMs have become more creative over time, and how consistent their creative output is. In this study, we evaluated 14 widely used LLMs -- including GPT-4, Claude, Llama, Grok, Mistral, and DeepSeek -- across two validated creativity assessments: the Divergent Association Task (DAT) and the Alternative Uses Task (AUT). Contrary to expectations, we found no evidence of increased creative performance over the past 18-24 months, with GPT-4 performing worse than in previous studies. For the more widely used AUT, all models performed on average better than the average human, with GPT-4o and o3-mini performing best. However, only 0.28% of LLM-generated responses reached the top 10% of human creativity benchmarks. Beyond inter-model differences, we document substantial intra-model variability: the same LLM, given the same prompt, can produce outputs ranging from below-average to original. This variability has important implications for both creativity research and practical applications. Ignoring such variability risks misjudging the creative potential of LLMs, either inflating or underestimating their capabilities. The choice of prompts affected LLMs differently. Our findings underscore the need for more nuanced evaluation frameworks and highlight the importance of model selection, prompt design, and repeated assessment when using Generative AI (GenAI) tools in creative contexts.

CLMay 14, 2025
S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Assessment

Jennifer Haase, Paul H. P. Hanel, Sebastian Pokutta

This paper introduces S-DAT (Synthetic-Divergent Association Task), a scalable, multilingual framework for automated assessment of divergent thinking (DT) -a core component of human creativity. Traditional creativity assessments are often labor-intensive, language-specific, and reliant on subjective human ratings, limiting their scalability and cross-cultural applicability. In contrast, S-DAT leverages large language models and advanced multilingual embeddings to compute semantic distance -- a language-agnostic proxy for DT. We evaluate S-DAT across eleven diverse languages, including English, Spanish, German, Russian, Hindi, and Japanese (Kanji, Hiragana, Katakana), demonstrating robust and consistent scoring across linguistic contexts. Unlike prior DAT approaches, the S-DAT shows convergent validity with other DT measures and correct discriminant validity with convergent thinking. This cross-linguistic flexibility allows for more inclusive, global-scale creativity research, addressing key limitations of earlier approaches. S-DAT provides a powerful tool for fairer, more comprehensive evaluation of cognitive flexibility in diverse populations and can be freely assessed online: https://sdat.iol.zib.de/.

SENov 19, 2021
Understanding Developers Well-Being and Productivity: a 2-year Longitudinal Analysis during the COVID-19 Pandemic

Daniel Russo, Paul H. P. Hanel, Niels van Berkel

The COVID-19 pandemic has brought significant and enduring shifts in various aspects of life, including increased flexibility in work arrangements. In a longitudinal study, spanning 24 months with six measurement points from April 2020 to April 2022, we explore changes in well-being, productivity, social contacts, and needs of software engineers during this time. Our findings indicate systematic changes in various variables. For example, well-being and quality of social contacts increased while emotional loneliness decreased as lockdown measures were relaxed. Conversely, people's boredom and productivity, remained stable. Furthermore, a preliminary investigation into the future of work at the end of the pandemic revealed a consensus among developers for a preference of hybrid work arrangements. We also discovered that prior job changes and low job satisfaction were consistently linked to intentions to change jobs if current work conditions do not meet developers' needs. This highlights the need for software organizations to adapt to various work arrangements to remain competitive employers. Building upon our findings and the existing literature, we introduce the Integrated Job Demands-Resources and Self-Determination (IJARS) Model as a comprehensive framework to explain the well-being and productivity of software engineers during the COVID-19 pandemic.

SEJul 16, 2021
Satisfaction and Performance of Software Developers during Enforced Work from Home in the COVID-19 Pandemic

Daniel Russo, Paul H. P. Hanel, Seraphina Altnickel et al.

Following the onset of the COVID-19 pandemic and subsequent lockdowns, the daily lives of software engineers were heavily disrupted as they were abruptly forced to work remotely from home. To better understand and contrast typical working days in this new reality with work in pre-pandemic times, we conducted one exploratory (N = 192) and one confirmatory study (N = 290) with software engineers recruited remotely. Specifically, we build on self-determination theory to evaluate whether and how specific activities are associated with software engineers' satisfaction and productivity. To explore the subject domain, we first ran a two-wave longitudinal study. We found that the time software engineers spent on specific activities (e.g., coding, bugfixing, helping others) while working from home was similar to pre-pandemic times. Also, the amount of time developers spent on each activity was unrelated to their general well-being, perceived productivity, and other variables such as basic needs. Our confirmatory study found that activity-specific variables (e.g., how much autonomy software engineers had during coding) do predict activity satisfaction and productivity but not by activity-independent variables such as general resilience or a good work-life balance. Interestingly, we found that satisfaction and autonomy were significantly higher when software engineers were helping others and lower when they were bugfixing. Finally, we discuss implications for software engineers, management, and researchers. In particular, active company policies to support developers' need for autonomy, relatedness, and competence appear particularly effective in a WFH context.

CYJul 24, 2020
Predictors of Well-being and Productivity among Software Professionals during the COVID-19 Pandemic -- A Longitudinal Study

Daniel Russo, Paul H. P. Hanel, Seraphina Altnickel et al.

The COVID-19 pandemic has forced governments worldwide to impose movement restrictions on their citizens. Although critical to reducing the virus' reproduction rate, these restrictions come with far-reaching social and economic consequences. In this paper, we investigate the impact of these restrictions on an individual level among software engineers who were working from home. Although software professionals are accustomed to working with digital tools, but not all of them remotely, in their day-to-day work, the abrupt and enforced work-from-home context has resulted in an unprecedented scenario for the software engineering community. In a two-wave longitudinal study (N=192), we covered over 50 psychological, social, situational, and physiological factors that have previously been associated with well-being or productivity. Examples include anxiety, distractions, coping strategies, psychological and physical needs, office set-up, stress, and work motivation. This design allowed us to identify the variables that explained unique variance in well-being and productivity. Results include (1) the quality of social contacts predicted positively, and stress predicted an individual's well-being negatively when controlling for other variables consistently across both waves; (2) boredom and distractions predicted productivity negatively; (3) productivity was less strongly associated with all predictor variables at time two compared to time one, suggesting that software engineers adapted to the lockdown situation over time; and (4) longitudinal analyses did not provide evidence that any predictor variable causal explained variance in well-being and productivity. Overall, we conclude that working from home was per se not a significant challenge for software engineers.