CLOct 2, 2022
Cognitive modelling with multilayer networks: Insights, advancements and future challengesMassimo Stella, Salvatore Citraro, Giulio Rossetti et al.
The mental lexicon is a complex cognitive system representing information about the words/concepts that one knows. Decades of psychological experiments have shown that conceptual associations across multiple, interactive cognitive levels can greatly influence word acquisition, storage, and processing. How can semantic, phonological, syntactic, and other types of conceptual associations be mapped within a coherent mathematical framework to study how the mental lexicon works? We here review cognitive multilayer networks as a promising quantitative and interpretative framework for investigating the mental lexicon. Cognitive multilayer networks can map multiple types of information at once, thus capturing how different layers of associations might co-exist within the mental lexicon and influence cognitive processing. This review starts with a gentle introduction to the structure and formalism of multilayer networks. We then discuss quantitative mechanisms of psychological phenomena that could not be observed in single-layer networks and were only unveiled by combining multiple layers of the lexicon: (i) multiplex viability highlights language kernels and facilitative effects of knowledge processing in healthy and clinical populations; (ii) multilayer community detection enables contextual meaning reconstruction depending on psycholinguistic features; (iii) layer analysis can mediate latent interactions of mediation, suppression and facilitation for lexical access. By outlining novel quantitative perspectives where multilayer networks can shed light on cognitive knowledge representations, also in next-generation brain/mind models, we discuss key limitations and promising directions for cutting-edge future research.
AIAug 1, 2024
Y Social: an LLM-powered Social Media Digital TwinGiulio Rossetti, Massimo Stella, Rémy Cazabet et al.
In this paper we introduce Y, a new-generation digital twin designed to replicate an online social media platform. Digital twins are virtual replicas of physical systems that allow for advanced analyses and experimentation. In the case of social media, a digital twin such as Y provides a powerful tool for researchers to simulate and understand complex online interactions. {\tt Y} leverages state-of-the-art Large Language Models (LLMs) to replicate sophisticated agent behaviors, enabling accurate simulations of user interactions, content dissemination, and network dynamics. By integrating these aspects, Y offers valuable insights into user engagement, information spread, and the impact of platform policies. Moreover, the integration of LLMs allows Y to generate nuanced textual content and predict user responses, facilitating the study of emergent phenomena in online environments. To better characterize the proposed digital twin, in this paper we describe the rationale behind its implementation, provide examples of the analyses that can be performed on the data it enables to be generated, and discuss its relevance for multidisciplinary research.
CLApr 13, 2023
Towards hypergraph cognitive networks as feature-rich models of knowledgeSalvatore Citraro, Simon De Deyne, Massimo Stella et al.
Semantic networks provide a useful tool to understand how related concepts are retrieved from memory. However, most current network approaches use pairwise links to represent memory recall patterns. Pairwise connections neglect higher-order associations, i.e. relationships between more than two concepts at a time. These higher-order interactions might covariate with (and thus contain information about) how similar concepts are along psycholinguistic dimensions like arousal, valence, familiarity, gender and others. We overcome these limits by introducing feature-rich cognitive hypergraphs as quantitative models of human memory where: (i) concepts recalled together can all engage in hyperlinks involving also more than two concepts at once (cognitive hypergraph aspect), and (ii) each concept is endowed with a vector of psycholinguistic features (feature-rich aspect). We build hypergraphs from word association data and use evaluation methods from machine learning features to predict concept concreteness. Since concepts with similar concreteness tend to cluster together in human memory, we expect to be able to leverage this structure. Using word association data from the Small World of Words dataset, we compared a pairwise network and a hypergraph with N=3586 concepts/nodes. Interpretable artificial intelligence models trained on (1) psycholinguistic features only, (2) pairwise-based feature aggregations, and on (3) hypergraph-based aggregations show significant differences between pairwise and hypergraph links. Specifically, our results show that higher-order and feature-rich hypergraph models contain richer information than pairwise networks leading to improved prediction of word concreteness. The relation with previous studies about conceptual clustering and compartmentalisation in associative knowledge and human memory are discussed.
CLNov 3, 2025
Math anxiety and associative knowledge structure are entwined in psychology students but not in Large Language Models like GPT-3.5 and GPT-4oLuciana Ciringione, Emma Franchino, Simone Reigl et al.
Math anxiety poses significant challenges for university psychology students, affecting their career choices and overall well-being. This study employs a framework based on behavioural forma mentis networks (i.e. cognitive models that map how individuals structure their associative knowledge and emotional perceptions of concepts) to explore individual and group differences in the perception and association of concepts related to math and anxiety. We conducted 4 experiments involving psychology undergraduates from 2 samples (n1 = 70, n2 = 57) compared against GPT-simulated students (GPT-3.5: n2 = 300; GPT-4o: n4 = 300). Experiments 1, 2, and 3 employ individual-level network features to predict psychometric scores for math anxiety and its facets (observational, social and evaluational) from the Math Anxiety Scale. Experiment 4 focuses on group-level perceptions extracted from human students, GPT-3.5 and GPT-4o's networks. Results indicate that, in students, positive valence ratings and higher network degree for "anxiety", together with negative ratings for "math", can predict higher total and evaluative math anxiety. In contrast, these models do not work on GPT-based data because of differences in simulated networks and psychometric scores compared to humans. These results were also reconciled with differences found in the ways that high/low subgroups of simulated and real students framed semantically and emotionally STEM concepts. High math-anxiety students collectively framed "anxiety" in an emotionally polarising way, absent in the negative perception of low math-anxiety students. "Science" was rated positively, but contrasted against the negative perception of "math". These findings underscore the importance of understanding concept perception and associations in managing students' math anxiety.
AIApr 18
LLMs can persuade only psychologically susceptible humans on societal issues, via trust in AI and emotional appeals, amid logical fallaciesAlexis Carrillo, Salvatore Citraro, Ali Aghazhadeh Ardebili et al.
Scarce longitudinal evidence examines LLMs' persuasiveness and humanness along time-evolving psychological frameworks. We introduce Talk2AI, a longitudinal framework quantifying psycho-social, reasoning and affective dimensions of LLMs' persuasiveness about polarizing societal topics. In a four-way longitudinal setup, Talk2AI's 770 participants engaged in structured conversations with one of four leading LLMs on topics like climate change, social media misinformation, and math anxiety. This produced 3,080 conversations over 60,000 turns. After each wave, participants reported conviction in their initial topic stance, perceived opinion change, LLM's perceived humanness, a self-donation to the topic and a textual explanation. Feedback time series showed longitudinal inertia in convictions, indicating some human anchoring to initial opinions even after repeated exposure to AI-generated arguments. Interestingly, NLP analyses revealed that both humans and LLMs relied on fallacious reasoning in 1 conversational quip every 6, countering the ``LLMs as superior systems" stereotype behind LLMs' cognitive surrender. LLMs' perceived humanness was most learnable from sociodemographic, psychological and engagement features ($R^2=0.44$), followed by opinion change ($R^2=0.34$), conviction ($R^2=0.26$) and personal endowment ($R^2=0.24$). Crucially, explainable AI (XAI) indicated: (i) the presence of individuals more susceptible to LLM-based opinion changes; (ii) psychological susceptibility to LLM-convincing consisted of having more trust in LLMs, being more agreeable and extraverted and with a higher need for cognition. A multiverse approach with mixed-effects models confirmed XAI results, alongside strong individual differences. Talk2AI provides a grounded framework and evidence for detecting how GenAI can influence human opinions via multiple psycho-social pathways in AI-human digital platforms.
AIApr 30Code
The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in textSebastiano Franchini, Alexis Carrillo, Edoardo Sebastiano De Duro et al.
We introduce Target-Event-Agent Networks (TEA Nets) as a computational framework to extract subjects (``Agents"), verbs (``Events"), and objects (``Targets") from texts. Grounded in cognitive network science and artificial intelligence, TEA Nets are implemented as an open-source Python library. We test TEA Nets in three case studies, demonstrating the framework's ability to perform interpretable emotion detection, semantic frame analyses, and linguistic inquiries across conspiracy texts and textual responses generated by LLMs. In the LOCO conspiracy corpus, TEA Nets revealed that highly conspiratorial narratives (4,227 texts) linked personal pronouns (``I", ``you", ``we") with the same actions twice as frequently as low-similarity conspiracy narratives. High-conspiracy narratives connected person-focused elements (``you", ``people") through actions eliciting anger above the random baseline ($z = 2.63, p < .05$), a trend absent in low-similarity conspiracy narratives, which emphasized scientific actors (``researcher", ``scientist"). In the HOPE and CounseLLMe datasets of 212 (human) and 200 (LLM-based) psychotherapy transcripts, respectively, TEA Nets highlighted emotional differences. When expressing feelings, Claude 3 Haiku, GPT-3.5, and humans used sad words with higher frequency than random expectations but Haiku expressed sadness with lower emotional intensity than humans ($U = 1243.5, p = .036$). We discuss these differences in the context of psychotherapy training on LLM-simulated patients. Our results show that Target-Event-Agent Networks can extract relevant emotional, syntactic, and semantic insights from narratives, opening new avenues for text analysis with cognitive network science.
CLApr 14
The role of System 1 and System 2 semantic memory structure in human and LLM biasesKatherine Abramski, Giulio Rossetti, Massimo Stella
Implicit biases in both humans and large language models (LLMs) pose significant societal risks. Dual process theories propose that biases arise primarily from associative System 1 thinking, while deliberative System 2 thinking mitigates bias, but the cognitive mechanisms that give rise to this phenomenon remain poorly understood. To better understand what underlies this duality in humans, and possibly in LLMs, we model System 1 and System 2 thinking as semantic memory networks with distinct structures, built from comparable datasets generated by both humans and LLMs. We then investigate how these distinct semantic memory structures relate to implicit gender bias using network-based evaluation metrics. We find that semantic memory structures are irreducible only in humans, suggesting that LLMs lack certain types of human-like conceptual knowledge. Moreover, semantic memory structure relates consistently to implicit bias only in humans, with lower levels of bias in System~2 structures. These findings suggest that certain types of conceptual knowledge contribute to bias regulation in humans, but not in LLMs, highlighting fundamental differences between human and machine cognition.
CLJan 12
How to predict creativity ratings from written narratives: A comparison of co-occurrence and textual forma mentis networksRoberto Passaro, Edith Haim, Massimo Stella
This tutorial paper provides a step-by-step workflow for building and analysing semantic networks from short creative texts. We introduce and compare two widely used text-to-network approaches: word co-occurrence networks and textual forma mentis networks (TFMNs). We also demonstrate how they can be used in machine learning to predict human creativity ratings. Using a corpus of 1029 short stories, we guide readers through text preprocessing, network construction, feature extraction (structural measures, spreading-activation indices, and emotion scores), and application of regression models. We evaluate how network-construction choices influence both network topology and predictive performance. Across all modelling settings, TFMNs consistently outperformed co-occurrence networks through lower prediction errors (best MAE = 0.581 for TFMN, vs 0.592 for co-occurrence with window size 3). Network-structural features dominated predictive performance (MAE = 0.591 for TFMN), whereas emotion features performed worse (MAE = 0.711 for TFMN) and spreading-activation measures contributed little (MAE = 0.788 for TFMN). This paper offers practical guidance for researchers interested in applying network-based methods for cognitive fields like creativity research. we show when syntactic networks are preferable to surface co-occurrence models, and provide an open, reproducible workflow accessible to newcomers in the field, while also offering deeper methodological insight for experienced researchers.
CLFeb 16
Cognitive networks reconstruct mindsets about STEM subjects and educational contexts in almost 1000 high-schoolers, University students and LLM-based digital twinsFrancesco Gariboldi, Emma Franchino, Edith Haim et al.
Attitudes toward STEM develop from the interaction of conceptual knowledge, educational experiences, and affect. Here we use cognitive network science to reconstruct group mindsets as behavioural forma mentis networks (BFMNs). In this case, nodes are cue words and free associations, edges are empirical associative links, and each concept is annotated with perceived valence. We analyse BFMNs from N = 994 observations spanning high school students, university students, and early-career STEM experts, alongside LLM (GPT-oss) "digital twins" prompted to emulate comparable profiles. Focusing also on semantic neighbourhoods ("frames") around key target concepts (e.g., STEM subjects or educational actors/places), we quantify frames in terms of valence auras, emotional profiles, network overlap (Jaccard similarity), and concreteness relative to null baselines. Across student groups, science and research are consistently framed positively, while their core quantitative subjects (mathematics and statistics) exhibit more negative and anxiety related auras, amplified in higher math-anxiety subgroups, evidencing a STEM-science cognitive and emotional dissonance. High-anxiety frames are also less concrete than chance, suggesting more abstract and decontextualised representations of threatening quantitative domains. Human networks show greater overlapping between mathematics and anxiety than GPT-oss. The results highlight how BFMNs capture cognitive-affective signatures of mindsets towards the target domains and indicate that LLM-based digital twins approximate cultural attitudes but miss key context-sensitive, experience-based components relevant to replicate human educational anxiety.
AIApr 30
Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIsNaomi Esposito, Anthony Tricarico, Luisa Porzio et al.
To enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models reason about and report mathematics across human- and AI-like conditions. MEDS involves 28,000 personas from 14 LLMs (from families like Mistral, Qwen, DeepSeek, Granite, Phi and Grok) shadowing either humans or AI assistants. Each record/shadow includes a set of prompts along with psychological/sociodemographic persona metadata and four types of math tasks: (i) open math interview, (ii) three psychometric tests about math perceptions with explanations, (iii) cognitive networks capturing math attitudes, and (iv) 18 high-school math test questions together with their reasoning and confidence scores. MEDS differs from traditional score-only math benchmarks because it integrates concepts of self-efficacy, math anxiety, and cognitive network science besides math proficiency scores. Data validation shows that the sampled LLMs exhibit schema integrity and consistent personas, together with family-specific peculiarities like human-like negative math attitudes, logical fallacies, and math overconfidence. MEDS will benefit learning analytics experts, cognitive scientists, and developers of safer AI tutors in mathematics.
CLApr 30
Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behaviorAli Aghazadeh Ardebili, Massimo Stella
Large Language Models (LLMs) can strongly shape social discourse, yet datasets investigating how LLM outputs vary across controlled social and contextual prompting remain sparse. Cognitive Digital Shadows (CDS) is a 190,000-record synthetic corpus supporting analyses of LLM-generated discourse. Each CDS record is generated by one of 19 LLMs, prompted to shadow either a human persona or an AI-assistant role. CDS contains LLM responses on 4 controversial societal topics: vaccines/healthcare, social media disinformation, the gender gap in science, and STEM stereotypes. Persona-conditioned records encode 17 sociodemographic and psychological attributes, providing data linking LLMs' prompts, language, stances and reasoning. Texts are validated for topic anchoring and can support emotional analyses via interpretable NLP (e.g. textual forma mentis networks). CDS is enriched by a pooling platform with user-friendly dashboards, enabling easy, interactive group-level comparisons of emotional and semantic framing across personas, topics and models. The CDS prompting framework supports future audits of LLMs' bias, social sensitivity and alignment.
HCApr 6
Talk2AI: A Longitudinal Dataset of Human--AI Persuasive ConversationsAlexis Carrillo, Enrique Taietta, Ali Aghazadeh Ardebili et al.
Talk2AI is a large-scale longitudinal dataset of 3,080 conversations (totaling 30,800 turns) between human participants and Large Language Models (LLMs), designed to support research on persuasion, opinion change, and human-AI interaction. The corpus was collected from 770 profiled Italian adults across four weekly sessions in Spring 2025, using a within-subject design in which each participant conversed with a single model (GPT-4o, Claude Sonnet 3.7, DeepSeek-chat V3, or Mistral Large) on three socially relevant topics: climate change, math anxiety, and health misinformation. Each conversation is linked to rich contextual data, including sociodemographic characteristics and psychometric profiles. After each session, participants reported on opinion change, conviction stability, perceived humanness of the AI, and behavioral intentions, enabling fine-grained longitudinal analysis of how AI-mediated dialogue shapes beliefs and attitudes over time.
CLDec 2, 2024
The "LLM World of Words" English free association norms generated by large language modelsKatherine Abramski, Riccardo Improta, Giulio Rossetti et al.
Free associations have been extensively used in cognitive psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM biases. However, the absence of large-scale LLM-generated free association norms that are comparable with human-generated norms is an obstacle to this new research direction. To address this limitation, we create a new dataset of LLM-generated free association norms modeled after the "Small World of Words" (SWOW) human-generated norms consisting of approximately 12,000 cue words. We prompt three LLMs, namely Mistral, Llama3, and Haiku, with the same cues as those in the SWOW norms to generate three novel comparable datasets, the "LLM World of Words" (LWOW). Using both SWOW and LWOW norms, we construct cognitive network models of semantic memory that represent the conceptual knowledge possessed by humans and LLMs. We demonstrate how these datasets can be used for investigating implicit biases in humans and LLMs, such as the harmful gender stereotypes that are prevalent both in society and LLM outputs.
HCFeb 28, 2025
Measuring and identifying factors of individuals' trust in Large Language ModelsEdoardo Sebastiano De Duro, Giuseppe Alessandro Veltri, Hudson Golino et al.
Large Language Models (LLMs) can engage in human-looking conversational exchanges. Although conversations can elicit trust between users and LLMs, scarce empirical research has examined trust formation in human-LLM contexts, beyond LLMs' trustworthiness or human trust in AI in general. Here, we introduce the Trust-In-LLMs Index (TILLMI) as a new framework to measure individuals' trust in LLMs, extending McAllister's cognitive and affective trust dimensions to LLM-human interactions. We developed TILLMI as a psychometric scale, prototyped with a novel protocol we called LLM-simulated validity. The LLM-based scale was then validated in a sample of 1,000 US respondents. Exploratory Factor Analysis identified a two-factor structure. Two items were then removed due to redundancy, yielding a final 6-item scale with a 2-factor structure. Confirmatory Factor Analysis on a separate subsample showed strong model fit ($CFI = .995$, $TLI = .991$, $RMSEA = .046$, $p_{X^2} > .05$). Convergent validity analysis revealed that trust in LLMs correlated positively with openness to experience, extraversion, and cognitive flexibility, but negatively with neuroticism. Based on these findings, we interpreted TILLMI's factors as "closeness with LLMs" (affective dimension) and "reliance on LLMs" (cognitive dimension). Younger males exhibited higher closeness with- and reliance on LLMs compared to older women. Individuals with no direct experience with LLMs exhibited lower levels of trust compared to LLMs' users. These findings offer a novel empirical foundation for measuring trust in AI-driven verbal communication, informing responsible design, and fostering balanced human-AI collaboration.
AINov 30, 2024
Forma mentis networks predict creativity ratings of short texts via interpretable artificial intelligence in human and GPT-simulated ratersEdith Haim, Natalie Fischer, Salvatore Citraro et al.
Creativity is a fundamental skill of human cognition. We use textual forma mentis networks (TFMN) to extract network (semantic/syntactic associations) and emotional features from approximately one thousand human- and GPT3.5-generated stories. Using Explainable Artificial Intelligence (XAI), we test whether features relative to Mednick's associative theory of creativity can explain creativity ratings assigned by humans and GPT-3.5. Using XGBoost, we examine three scenarios: (i) human ratings of human stories, (ii) GPT-3.5 ratings of human stories, and (iii) GPT-3.5 ratings of GPT-generated stories. Our findings reveal that GPT-3.5 ratings differ significantly from human ratings not only in terms of correlations but also because of feature patterns identified with XAI methods. GPT-3.5 favours 'its own' stories and rates human stories differently from humans. Feature importance analysis with SHAP scores shows that: (i) network features are more predictive for human creativity ratings but also for GPT-3.5's ratings of human stories; (ii) emotional features played a greater role than semantic/syntactic network structure in GPT-3.5 rating its own stories. These quantitative results underscore key limitations in GPT-3.5's ability to align with human assessments of creativity. We emphasise the need for caution when using GPT-3.5 to assess and generate creative content, as it does not yet capture the nuanced complexity that characterises human creativity.
CLJul 13, 2025
SpreadPy: A Python tool for modelling spreading activation and superdiffusion in cognitive multiplex networksSalvatore Citraro, Edith Haim, Alessandra Carini et al.
We introduce SpreadPy as a Python library for simulating spreading activation in cognitive single-layer and multiplex networks. Our tool is designed to perform numerical simulations testing structure-function relationships in cognitive processes. By comparing simulation results with grounded theories in knowledge modelling, SpreadPy enables systematic investigations of how activation dynamics reflect cognitive, psychological and clinical phenomena. We demonstrate the library's utility through three case studies: (1) Spreading activation on associative knowledge networks distinguishes students with high versus low math anxiety, revealing anxiety-related structural differences in conceptual organization; (2) Simulations of a creativity task show that activation trajectories vary with task difficulty, exposing how cognitive load modulates lexical access; (3) In individuals with aphasia, simulated activation patterns on lexical networks correlate with empirical error types (semantic vs. phonological) during picture-naming tasks, linking network structure to clinical impairments. SpreadPy's flexible framework allows researchers to model these processes using empirically derived or theoretical networks, providing mechanistic insights into individual differences and cognitive impairments. The library is openly available, supporting reproducible research in psychology, neuroscience, and education research.
CLFeb 26, 2025
Cognitive networks highlight differences and similarities in the STEM mindsets of human and LLM-simulated trainees, experts and academicsEdith Haim, Lars van den Bergh, Cynthia S. Q. Siew et al.
Understanding attitudes towards STEM means quantifying the cognitive and emotional ways in which individuals, and potentially large language models too, conceptualise such subjects. This study uses behavioural forma mentis networks (BFMNs) to investigate the STEM-focused mindset, i.e. ways of associating and perceiving ideas, of 177 human participants and 177 artificial humans simulated by GPT-3.5. Participants were split in 3 groups - trainees, experts and academics - to compare the influence of expertise level on their mindset. The results revealed that human forma mentis networks exhibited significantly higher clustering coefficients compared to GPT-3.5, indicating that human mindsets displayed a tendency to form and close triads of conceptual associations while recollecting STEM ideas. Human experts, in particular, demonstrated robust clustering coefficients, reflecting better integration of STEM concepts into their cognitive networks. In contrast, GPT-3.5 produced sparser mindsets. Furthermore, both human and GPT mindsets framed mathematics in neutral or positive terms, differently from STEM high schoolers, researchers and other large language models sampled in other works. This research contributes to understanding how mindset structure can provide cognitive insights about memory structure and machine limitations.
HCNov 6, 2024
PhDGPT: Introducing a psychometric and linguistic dataset about how large language models perceive graduate students and professors in psychologyEdoardo Sebastiano De Duro, Enrique Taietta, Riccardo Improta et al.
Machine psychology aims to reconstruct the mindset of Large Language Models (LLMs), i.e. how these artificial intelligences perceive and associate ideas. This work introduces PhDGPT, a prompting framework and synthetic dataset that encapsulates the machine psychology of PhD researchers and professors as perceived by OpenAI's GPT-3.5. The dataset consists of 756,000 datapoints, counting 300 iterations repeated across 15 academic events, 2 biological genders, 2 career levels and 42 unique item responses of the Depression, Anxiety, and Stress Scale (DASS-42). PhDGPT integrates these psychometric scores with their explanations in plain language. This synergy of scores and texts offers a dual, comprehensive perspective on the emotional well-being of simulated academics, e.g. male/female PhD students or professors. By combining network psychometrics and psycholinguistic dimensions, this study identifies several similarities and distinctions between human and LLM data. The psychometric networks of simulated male professors do not differ between physical and emotional anxiety subscales, unlike humans. Other LLMs' personification can reconstruct human DASS factors with a purity up to 80%. Furthemore, LLM-generated personifications across different scenarios are found to elicit explanations lower in concreteness and imageability in items coding for anxiety, in agreement with past studies about human psychology. Our findings indicate an advanced yet incomplete ability for LLMs to reproduce the complexity of human psychometric data, unveiling convenient advantages and limitations in using LLMs to replace human participants. PhDGPT also intriguingly capture the ability for LLMs to adapt and change language patterns according to prompted mental distress contextual features, opening new quantitative opportunities for assessing the machine psychology of these artificial intelligences.
CLOct 28, 2025
A word association network methodology for evaluating implicit biases in LLMs compared to humansKatherine Abramski, Giulio Rossetti, Massimo Stella
As Large language models (LLMs) become increasingly integrated into our lives, their inherent social biases remain a pressing concern. Detecting and evaluating these biases can be challenging because they are often implicit rather than explicit in nature, so developing evaluation methods that assess the implicit knowledge representations of LLMs is essential. We present a novel word association network methodology for evaluating implicit biases in LLMs based on simulating semantic priming within LLM-generated word association networks. Our prompt-based approach taps into the implicit relational structures encoded in LLMs, providing both quantitative and qualitative assessments of bias. Unlike most prompt-based evaluation methods, our method enables direct comparisons between various LLMs and humans, providing a valuable point of reference and offering new insights into the alignment of LLMs with human cognition. To demonstrate the utility of our methodology, we apply it to both humans and several widely used LLMs to investigate social biases related to gender, religion, ethnicity, sexual orientation, and political party. Our results reveal both convergences and divergences between LLM and human biases, providing new perspectives on the potential risks of using LLMs. Our methodology contributes to a systematic, scalable, and generalizable framework for evaluating and comparing biases across multiple LLMs and humans, advancing the goal of transparent and socially responsible language technologies.
CYMay 22, 2023
Cognitive network science reveals bias in GPT-3, ChatGPT, and GPT-4 mirroring math anxiety in high-school studentsKatherine Abramski, Salvatore Citraro, Luigi Lombardi et al.
Large language models are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. Here, we investigate perceptions of math and STEM fields provided by cutting-edge language models, namely GPT-3, Chat-GPT, and GPT-4, by applying an approach from network science and cognitive psychology. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have an overall negative perception of math and STEM fields, with math being perceived most negatively. We observe significant differences across the three LLMs. We observe that newer versions (i.e. GPT-4) produce richer, more complex perceptions as well as less negative perceptions compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them.
CYJan 19, 2022
Writing about COVID-19 vaccines: Emotional profiling unravels how mainstream and alternative press framed AstraZeneca, Pfizer and vaccination campaignsAlfonso Semeraro, Salvatore Vilella, Giancarlo Ruffo et al.
Since their announcement in November 2020, COVID-19 vaccines were largely debated by the press and social media. With most studies focusing on COVID-19 disinformation in social media, little attention has been paid to how mainstream news outlets framed COVID-19 narratives compared to alternative sources. To fill this gap, we use cognitive network science and natural language processing to reconstruct time-evolving semantic and emotional frames of 5745 Italian news, that were massively re-shared on Facebook and Twitter, about COVID-19 vaccines. We found consistently high levels of trust/anticipation and less disgust in the way mainstream sources framed the general idea of "vaccine/vaccino". These emotions were crucially missing in the ways alternative sources framed COVID-19 vaccines. More differences were found within specific instances of vaccines. Alternative news included titles framing the AstraZeneca vaccine with strong levels of sadness, absent in mainstream titles. Mainstream news initially framed "Pfizer" along more negative associations with side effects than "AstraZeneca". With the temporary suspension of the latter, on March 15th 2021, we identified a semantic/emotional shift: Even mainstream article titles framed "AstraZeneca" as semantically richer in negative associations with side effects, while "Pfizer" underwent a positive shift in valence, mostly related to its higher efficacy. "Thrombosis" entered the frame of vaccines together with fearful conceptual associations, while "death" underwent an emotional shift, steering towards fear in alternative titles and losing its hopeful connotation in mainstream titles. Our findings expose crucial aspects of the emotional narratives around COVID-19 vaccines adopted by the press, highlighting the need to understand how alternative and mainstream media report vaccination news.
CLJan 13, 2022
Feature-rich multiplex lexical networks reveal mental strategies of early language learningSalvatore Citraro, Michael S. Vitevitch, Massimo Stella et al.
Knowledge in the human mind exhibits a dualistic vector/network nature. Modelling words as vectors is key to natural language processing, whereas networks of word associations can map the nature of semantic memory. We reconcile these paradigms - fragmented across linguistics, psychology and computer science - by introducing FEature-Rich MUltiplex LEXical (FERMULEX) networks. This novel framework merges structural similarities in networks and vector features of words, which can be combined or explored independently. Similarities model heterogenous word associations across semantic/syntactic/phonological aspects of knowledge. Words are enriched with multi-dimensional feature embeddings including frequency, age of acquisition, length and polysemy. These aspects enable unprecedented explorations of cognitive knowledge. Through CHILDES data, we use FERMULEX networks to model normative language acquisition by 1000 toddlers between 18 and 30 months. Similarities and embeddings capture word homophily via conformity, which measures assortative mixing via distance and features. Conformity unearths a language kernel of frequent/polysemous/short nouns and verbs key for basic sentence production, supporting recent evidence of children's syntactic constructs emerging at 30 months. This kernel is invisible to network core-detection and feature-only clustering: It emerges from the dual vector/network nature of words. Our quantitative analysis reveals two key strategies in early word learning. Modelling word acquisition as random walks on FERMULEX topology, we highlight non-uniform filling of communicative developmental inventories (CDIs). Conformity-based walkers lead to accurate (75%), precise (55%) and partially well-recalled (34%) predictions of early word learning in CDIs, providing quantitative support to previous empirical findings and developmental theories.
CLOct 28, 2021
Cognitive network science quantifies feelings expressed in suicide letters and Reddit mental health communitiesSimmi Marina Joseph, Salvatore Citraro, Virginia Morini et al.
Writing messages is key to expressing feelings. This study adopts cognitive network science to reconstruct how individuals report their feelings in clinical narratives like suicide notes or mental health posts. We achieve this by reconstructing syntactic/semantic associations between conceptsin texts as co-occurrences enriched with affective data. We transform 142 suicide notes and 77,000 Reddit posts from the r/anxiety, r/depression, r/schizophrenia, and r/do-it-your-own (r/DIY) forums into 5 cognitive networks, each one expressing meanings and emotions as reported by authors. These networks reconstruct the semantic frames surrounding 'feel', enabling a quantification of prominent associations and emotions focused around feelings. We find strong feelings of sadness across all clinical Reddit boards, added to fear r/depression, and replaced by joy/anticipation in r/DIY. Semantic communities and topic modelling both highlight key narrative topics of 'regret', 'unhealthy lifestyle' and 'low mental well-being'. Importantly, negative associations and emotions co-existed with trustful/positive language, focused on 'getting better'. This emotional polarisation provides quantitative evidence that online clinical boards possess a complex structure, where users mix both positive and negative outlooks. This dichotomy is absent in the r/DIY reference board and in suicide notes, where negative emotional associations about regret and pain persist but are overwhelmed by positive jargon addressing loved ones. Our quantitative comparisons provide strong evidence that suicide notes encapsulate different ways of expressing feelings compared to online Reddit boards, the latter acting more like personal diaries and relief valve. Our findings provide an interpretable, quantitative aid for supporting psychological inquiries of human feelings in digital and clinical settings.
CYOct 26, 2021
DASentimental: Detecting depression, anxiety and stress in texts via emotional recall, cognitive networks and machine learningAsra Fatima, Li Ying, Thomas Hills et al.
Most current affect scales and sentiment analysis on written text focus on quantifying valence (sentiment) -- the most primary dimension of emotion. However, emotions are broader and more complex than valence. Distinguishing negative emotions of similar valence could be important in contexts such as mental health. This project proposes a semi-supervised machine learning model (DASentimental) to extract depression, anxiety and stress from written text. First, we trained the model to spot how sequences of recalled emotion words by $N=200$ individuals correlated with their responses to the Depression Anxiety Stress Scale (DASS-21). Within the framework of cognitive network science, we model every list of recalled emotions as a walk over a networked mental representation of semantic memory, with emotions connected according to free associations in people's memory. Among several tested machine learning approaches, we find that a multilayer perceptron neural network trained on word sequences and semantic network distances can achieve state-of-art, cross-validated predictions for depression ($R = 0.7$), anxiety ($R = 0.44$) and stress ($R = 0.52$). Though limited by sample size, this first-of-its-kind approach enables quantitative explorations of key semantic dimensions behind DAS levels. We find that semantic distances between recalled emotions and the dyad "sad-happy" are crucial features for estimating depression levels but are less important for anxiety and stress. We also find that semantic distance of recalls from "fear" can boost the prediction of anxiety but it becomes redundant when the "sad-happy" dyad is considered. Adopting DASentimental as a semi-supervised learning tool to estimate DAS in text, we apply it to a dataset of 142 suicide notes. We conclude by discussing key directions for future research enabled by artificial intelligence detecting stress, anxiety and depression.
SIAug 31, 2021
Network psychometrics and cognitive network science open new ways for detecting, understanding and tackling the complexity of math anxiety: A reviewMassimo Stella
Math anxiety is a clinical pathology impairing cognitive processing in math-related contexts. Originally thought to affect only inexperienced, low-achieving students, recent investigations show how math anxiety is vastly diffused even among high-performing learners. This review of data-informed studies outlines math anxiety as a complex system that: (i) cripples well-being, self-confidence and information processing on both conscious and subconscious levels, (ii) can be transmitted by social interactions, like a pathogen, and worsened by distorted perceptions, (iii) affects roughly 20% of students in 63 out of 64 worldwide educational systems but correlates weakly with academic performance, and (iv) poses a concrete threat to students' well-being, computational literacy and career prospects in science. These patterns underline the crucial need to go beyond performance for estimating math anxiety. Recent advances with network psychometrics and cognitive network science provide ideal frameworks for detecting, interpreting and intervening upon such clinical condition. Merging education research, psychology and data science, the approaches reviewed here reconstruct psychological constructs as complex systems, represented either as multivariate correlation models (e.g. graph exploratory analysis) or as cognitive networks of semantic/emotional associations (e.g. free association networks or forma mentis networks). Not only can these interconnected networks detect otherwise hidden levels of math anxiety but - more crucially - they can unveil the specific layout of interacting factors, e.g. key sources and targets, behind math anxiety in a given cohort. As discussed here, these network approaches open concrete ways for unveiling students' perceptions, emotions and mental well-being, and can enable future powerful data-informed interventions untangling math anxiety.
SOC-PHMar 29, 2021
Cognitive networks identify the content of English and Italian popular posts about COVID-19 vaccines: Anticipation, logistics, conspiracy and loss of trustMassimo Stella, Michael S. Vitevitch, Federico Botta
Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between 12/2020 and 03/2021. One popular English tweet was liked up to 495,000 times, stressing how popular tweets affected cognitively massive populations. We investigate both text and multimedia in tweets, building a knowledge graph of syntactic/semantic associations in messages including visual features and indicating how online users framed social discourse mostly around the logistics of vaccine distribution. The English semantic frame of "vaccine" was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with "vaccine," "hoax" and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in massively read English posts (absent in Italian messages). The image analysis found that popular tweets with images of people wearing face masks used language lacking the trust and joy found in tweets showing people with no masks, indicating a negative affect attributed to face covering in social discourse. A behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like less sad messages, highlighting an interplay between emotions and content diffusion beyond sentiment. With the AstraZeneca vaccine being suspended in mid March 2021, "Astrazeneca" was associated with trustful language driven by experts, but popular Italian tweets framed "vaccine" by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust.
CYFeb 25, 2021
Cognitive network science for understanding online social cognitions: A brief reviewMassimo Stella
Social media are digitalising massive amounts of users' cognitions in terms of timelines and emotional content. Such Big Data opens unprecedented opportunities for investigating cognitive phenomena like perception, personality and information diffusion but requires suitable interpretable frameworks. Since social media data come from users' minds, worthy candidates for this challenge are cognitive networks, models of cognition giving structure to mental conceptual associations. This work outlines how cognitive network science can open new, quantitative ways for understanding cognition through online media, like: (i) reconstructing how users semantically and emotionally frame events with contextual knowledge unavailable to machine learning, (ii) investigating conceptual salience/prominence through knowledge structure in social discourse; (iii) studying users' personality traits like openness-to-experience, curiosity, and creativity through language in posts; (iv) bridging cognitive/emotional content and social dynamics via multilayer networks comparing the mindsets of influencers and followers. These advancements combine cognitive-, network- and computer science to understand cognitive mechanisms in both digital and real-world settings but come with limitations concerning representativeness, individual variability and data integration. Such aspects are discussed along the ethical implications of manipulating socio-cognitive data. In the future, reading cognitions through networks and social media can expose cognitive biases amplified by online platforms and relevantly inform policy making, education and markets about massive, complex cognitive trends.
SIJul 23, 2020
Revealing semantic and emotional structure of suicide notes with cognitive network scienceAndreia Sofia Teixeira, Szymon Talaga, Trevor James Swanson et al.
Understanding the cognitive and emotional perceptions of people who commit suicide is one of the most sensitive scientific challenges. There are circumstances where people feel the need to leave something written, an artifact where they express themselves, registering their last words and feelings. These suicide notes are of utmost importance for better understanding the psychology of suicidal ideation. This work gives structure to the linguistic content of suicide notes, revealing interconnections between cognitive and emotional states of people who committed suicide. We build upon cognitive network science, psycholinguistics and semantic frame theory to introduce a network representation of the mindset expressed in suicide notes. Our cognitive network representation enables the quantitative analysis of the language in suicide notes through structural balance theory, semantic prominence and emotional profiling. Our results indicate that the emotional syntax connecting positively- and negatively-valenced terms gives rise to a degree of structural balance that is significantly higher than null models where the affective structure was randomized. We show that suicide notes are affectively compartmentalized such that positive concepts tend to cluster together and dominate the overall network structure. A key positive concept is "love", which integrates information relating the self to others in ways that are semantically prominent across suicide notes. The emotions populating the semantic frame of "love" combine joy and trust with anticipation and sadness, which connects with psychological theories about meaning-making and narrative psychology. Our results open new ways for understanding the structure of genuine suicide notes informing future research for suicide prevention.
SOC-PHFeb 28, 2018
Distance entropy cartography characterises centrality in complex networksMassimo Stella, Manlio De Domenico
We introduce distance entropy as a measure of homogeneity in the distribution of path lengths between a given node and its neighbours in a complex network. Distance entropy defines a new centrality measure whose properties are investigated for a variety of synthetic network models. By coupling distance entropy information with closeness centrality, we introduce a network cartography which allows one to reduce the degeneracy of ranking based on closeness alone. We apply this methodology to the empirical multiplex lexical network encoding the linguistic relationships known to English speaking toddlers. We show that the distance entropy cartography better predicts how children learn words compared to closeness centrality. Our results highlight the importance of distance entropy for gaining insights from distance patterns in complex networks.
SOC-PHFeb 20, 2018
Bots increase exposure to negative and inflammatory content in online social systemsMassimo Stella, Emilio Ferrara, Manlio De Domenico
Societies are complex systems which tend to polarize into sub-groups of individuals with dramatically opposite perspectives. This phenomenon is reflected -- and often amplified -- in online social networks where, however, humans are no more the only players, and co-exist alongside with social bots, i.e., software-controlled accounts. Analyzing large-scale social data collected during the Catalan referendum for independence on October 1, 2017, consisting of nearly 4 millions Twitter posts generated by almost 1 million users, we identify the two polarized groups of Independentists and Constitutionalists and quantify the structural and emotional roles played by social bots. We show that bots act from peripheral areas of the social system to target influential humans of both groups, bombarding Independentists with violent contents, increasing their exposure to negative and inflammatory narratives and exacerbating social conflict online. Our findings stress the importance of developing countermeasures to unmask these forms of automated social manipulation.
SOC-PHMay 26, 2017
Multiplex model of mental lexicon reveals explosive learning in humansMassimo Stella, Nicole M. Beckage, Markus Brede et al.
Word similarities affect language acquisition and use in a multi-relational way barely accounted for in the literature. We propose a multiplex network representation of this mental lexicon of word similarities as a natural framework for investigating large-scale cognitive patterns. Our representation accounts for semantic, taxonomic, and phonological interactions and it identifies a cluster of words which are used with greater frequency, are identified, memorised, and learned more easily, and have more meanings than expected at random. This cluster emerges around age 7 through an explosive transition not reproduced by null models. We relate this explosive emergence to polysemy -- redundancy in word meanings. Results indicate that the word cluster acts as a core for the lexicon, increasing both lexical navigability and robustness to linguistic degradation. Our findings provide quantitative confirmation of existing conjectures about core structure in the mental lexicon and the importance of integrating multi-relational word-word interactions in psycholinguistic frameworks.
SOC-PHSep 11, 2016
Multiplex lexical networks reveal patterns in early word acquisition in childrenMassimo Stella, Nicole M. Beckage, Markus Brede
Network models of language have provided a way of linking cognitive processes to the structure and connectivity of language. However, one shortcoming of current approaches is focusing on only one type of linguistic relationship at a time, missing the complex multi-relational nature of language. In this work, we overcome this limitation by modelling the mental lexicon of English-speaking toddlers as a multiplex lexical network, i.e. a multi-layered network where N=529 words/nodes are connected according to four types of relationships: (i) free associations, (ii) feature sharing, (iii) co-occurrence, and (iv) phonological similarity. We provide analysis of the topology of the resulting multiplex and then proceed to evaluate single layers as well as the full multiplex structure on their ability to predict empirically observed age of acquisition data of English speaking toddlers. We find that the emerging multiplex network topology is an important proxy of the cognitive processes of acquisition, capable of capturing emergent lexicon structure. In fact, we show that the multiplex topology is fundamentally more powerful than individual layers in predicting the ordering with which words are acquired. Furthermore, multiplex analysis allows for a quantification of distinct phases of lexical acquisition in early learners: while initially all the multiplex layers contribute to word learning, after about month 23 free associations take the lead in driving word acquisition.
SOC-PHApr 5, 2016
Mental Lexicon Growth Modelling Reveals the Multiplexity of the English LanguageMassimo Stella, Markus Brede
In this work we extend previous analyses of linguistic networks by adopting a multi-layer network framework for modelling the human mental lexicon, i.e. an abstract mental repository where words and concepts are stored together with their linguistic patterns. Across a three-layer linguistic multiplex, we model English words as nodes and connect them according to (i) phonological similarities, (ii) synonym relationships and (iii) free word associations. Our main aim is to exploit this multi-layered structure to explore the influence of phonological and semantic relationships on lexicon assembly over time. We propose a model of lexicon growth which is driven by the phonological layer: words are suggested according to different orderings of insertion (e.g. shorter word length, highest frequency, semantic multiplex features) and accepted or rejected subject to constraints. We then measure times of network assembly and compare these to empirical data about the age of acquisition of words. In agreement with empirical studies in psycholinguistics, our results provide quantitative evidence for the hypothesis that word acquisition is driven by features at multiple levels of organisation within language.
CLOct 16, 2014
Patterns in the English Language: Phonological Networks, Percolation and Assembly ModelsMassimo Stella, Markus Brede
In this paper we provide a quantitative framework for the study of phonological networks (PNs) for the English language by carrying out principled comparisons to null models, either based on site percolation, randomization techniques, or network growth models. In contrast to previous work, we mainly focus on null models that reproduce lower order characteristics of the empirical data. We find that artificial networks matching connectivity properties of the English PN are exceedingly rare: this leads to the hypothesis that the word repertoire might have been assembled over time by preferentially introducing new words which are small modifications of old words. Our null models are able to explain the "power-law-like" part of the degree distributions and generally retrieve qualitative features of the PN such as high clustering, high assortativity coefficient, and small-world characteristics. However, the detailed comparison to expectations from null models also points out significant differences, suggesting the presence of additional constraints in word assembly. Key constraints we identify are the avoidance of large degrees, the avoidance of triadic closure, and the avoidance of large non-percolating clusters.