Stacy Marsella

CL
h-index6
12papers
86citations
Novelty33%
AI Score45

12 Papers

CLOct 3, 2023
Investigating Large Language Models' Perception of Emotion Using Appraisal Theory

Nutchanon Yongsatianchot, Parisa Ghanad Torshizi, Stacy Marsella

Large Language Models (LLM) like ChatGPT have significantly advanced in recent years and are now being used by the general public. As more people interact with these systems, improving our understanding of these black box models is crucial, especially regarding their understanding of human psychological aspects. In this work, we investigate their emotion perception through the lens of appraisal and coping theory using the Stress and Coping Process Questionaire (SCPQ). SCPQ is a validated clinical instrument consisting of multiple stories that evolve over time and differ in key appraisal variables such as controllability and changeability. We applied SCPQ to three recent LLMs from OpenAI, davinci-003, ChatGPT, and GPT-4 and compared the results with predictions from the appraisal theory and human data. The results show that LLMs' responses are similar to humans in terms of dynamics of appraisal and coping, but their responses did not differ along key appraisal dimensions as predicted by the theory and data. The magnitude of their responses is also quite different from humans in several variables. We also found that GPTs can be quite sensitive to instruction and how questions are asked. This work adds to the growing literature evaluating the psychological aspects of LLMs and helps enrich our understanding of the current models.

NESep 1, 2022
EvolvingBehavior: Towards Co-Creative Evolution of Behavior Trees for Game NPCs

Nathan Partlan, Luis Soto, Jim Howe et al.

To assist game developers in crafting game NPCs, we present EvolvingBehavior, a novel tool for genetic programming to evolve behavior trees in Unreal Engine 4. In an initial evaluation, we compare evolved behavior to hand-crafted trees designed by our researchers, and to randomly-grown trees, in a 3D survival game. We find that EvolvingBehavior is capable of producing behavior approaching the designer's goals in this context. Finally, we discuss implications and future avenues of exploration for co-creative game AI design tools, as well as challenges and difficulties in behavior tree evolution.

CLFeb 23
Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

Ian Steenstra, Paola Pedrelli, Weiyan Shi et al.

Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character.AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient delusions ("AI Psychosis") and failure to de-escalate suicide risk. Finally, we validate an interactive data visualization dashboard with diverse stakeholders, including AI engineers and red teamers, mental health professionals, and policy experts (N=9), demonstrating that this framework effectively enables stakeholders to audit the "black box" of AI psychotherapy. These findings underscore the critical safety risks of AI-provided mental health support and the necessity of simulation-based clinical red teaming before deployment.

CVAug 6, 2022
Study of detecting behavioral signatures within DeepFake videos

Qiaomu Miao, Sinhwa Kang, Stacy Marsella et al.

There is strong interest in the generation of synthetic video imagery of people talking for various purposes, including entertainment, communication, training, and advertisement. With the development of deep fake generation models, synthetic video imagery will soon be visually indistinguishable to the naked eye from a naturally capture video. In addition, many methods are continuing to improve to avoid more careful, forensic visual analysis. Some deep fake videos are produced through the use of facial puppetry, which directly controls the head and face of the synthetic image through the movements of the actor, allow the actor to 'puppet' the image of another. In this paper, we address the question of whether one person's movements can be distinguished from the original speaker by controlling the visual appearance of the speaker but transferring the behavior signals from another source. We conduct a study by comparing synthetic imagery that: 1) originates from a different person speaking a different utterance, 2) originates from the same person speaking a different utterance, and 3) originates from a different person speaking the same utterance. Our study shows that synthetic videos in all three cases are seen as less real and less engaging than the original source video. Our results indicate that there could be a behavioral signature that is detectable from a person's movements that is separate from their visual appearance, and that this behavioral signature could be used to distinguish a deep fake from a properly captured video.

HCOct 4, 2023
Large language models in textual analysis for gesture selection

Laura B. Hensel, Nutchanon Yongsatianchot, Parisa Torshizi et al.

Gestures perform a variety of communicative functions that powerfully influence human face-to-face interaction. How this communicative function is achieved varies greatly between individuals and depends on the role of the speaker and the context of the interaction. Approaches to automatic gesture generation vary not only in the degree to which they rely on data-driven techniques but also the degree to which they can produce context and speaker specific gestures. However, these approaches face two major challenges: The first is obtaining sufficient training data that is appropriate for the context and the goal of the application. The second is related to designer control to realize their specific intent for the application. Here, we approach these challenges by using large language models (LLMs) to show that these powerful models of large amounts of data can be adapted for gesture analysis and generation. Specifically, we used ChatGPT as a tool for suggesting context-specific gestures that can realize designer intent based on minimal prompts. We also find that ChatGPT can suggests novel yet appropriate gestures not present in the minimal training data. The use of LLMs is a promising avenue for gesture generation that reduce the need for laborious annotations and has the potential to flexibly and quickly adapt to different designer intents.

CLOct 3, 2023
What's Next in Affective Modeling? Large Language Models

Nutchanon Yongsatianchot, Tobias Thejll-Madsen, Stacy Marsella

Large Language Models (LLM) have recently been shown to perform well at various tasks from language understanding, reasoning, storytelling, and information search to theory of mind. In an extension of this work, we explore the ability of GPT-4 to solve tasks related to emotion prediction. GPT-4 performs well across multiple emotion tasks; it can distinguish emotion theories and come up with emotional stories. We show that by prompting GPT-4 to identify key factors of an emotional experience, it is able to manipulate the emotional intensity of its own stories. Furthermore, we explore GPT-4's ability on reverse appraisals by asking it to predict either the goal, belief, or emotion of a person using the other two. In general, GPT-4 can make the correct inferences. We suggest that LLMs could play an important role in affective modeling; however, they will not fully replace works that attempt to model the mechanisms underlying emotion-related processes.

4.2AIMay 13
Modeling Bounded Rationality in Drug Shortage Pharmacists Using Attention-Guided Dynamic Decomposition

Yaniv Eliyahu Amiri, Noah Chicoine, Jacqueline Griffin et al.

Hospital pharmacists make high-stakes decisions to mitigate drug shortages under uncertainty, time pressure, and patient risk. Interviews revealed that pharmacists focus attention on a small subset of drugs, limiting cognitive effort to the most urgent cases. Motivated by these findings, we formalize a bounded-rational, attention-guided decision framework that dynamically decomposes drugs into a subset for high-cost reasoning and a complementary subset for low-cost monitoring. We develop two agents: an Expert Agent that applies attention weights derived from pharmacist interviews, and a Learner Agent that adapts attention allocation over time through experience. Across simulated scenarios spanning short to long horizons, we show that attention-guided planning supports stable decision-making without complete state reasoning. These results suggest that a primary decision is not what action to take, but where to allocate cognitive effort, and that attention-guided, satisficing strategies can reduce problem complexity while maintaining stable performance.

AINov 16, 2023
Data-Driven Bayesian Network Models of Hurricane Evacuation Decision Making

Hui Sophie Wang, Nutchanon Yongsatianchot, Stacy Marsella

Hurricanes cause significant economic and human costs, requiring individuals to make critical evacuation decisions under uncertainty and stress. To enhance the understanding of this decision-making process, we propose using Bayesian Networks (BNs) to model evacuation decisions during hurricanes. We collected questionnaire data from two significant hurricane events: Hurricane Harvey and Hurricane Irma. We employed a data-driven approach by first conducting variable selection using mutual information, followed by BN structure learning with two constraint-based algorithms. The robustness of the learned structures was enhanced by model averaging based on bootstrap resampling. We examined and compared the learned structures of both hurricanes, revealing potential causal relationships among key predictors of evacuation, including risk perception, information received from media, suggestions from family and friends, and neighbors evacuating. Our findings highlight the significant role of social influence, providing valuable insights into the process of evacuation decision-making. Our results demonstrate the applicability and effectiveness of data-driven BN modeling in evacuation decision making.

CROct 23, 2025
Security Logs to ATT&CK Insights: Leveraging LLMs for High-Level Threat Understanding and Cognitive Trait Inference

Soham Hans, Stacy Marsella, Sophia Hirschmann et al.

Understanding adversarial behavior in cybersecurity has traditionally relied on high-level intelligence reports and manual interpretation of attack chains. However, real-time defense requires the ability to infer attacker intent and cognitive strategy directly from low-level system telemetry such as intrusion detection system (IDS) logs. In this paper, we propose a novel framework that leverages large language models (LLMs) to analyze Suricata IDS logs and infer attacker actions in terms of MITRE ATT&CK techniques. Our approach is grounded in the hypothesis that attacker behavior reflects underlying cognitive biases such as loss aversion, risk tolerance, or goal persistence that can be extracted and modeled through careful observation of log sequences. This lays the groundwork for future work on behaviorally adaptive cyber defense and cognitive trait inference. We develop a strategy-driven prompt system to segment large amounts of network logs data into distinct behavioral phases in a highly efficient manner, enabling the LLM to associate each phase with likely techniques and underlying cognitive motives. By mapping network-layer events to high-level attacker strategies, our method reveals how behavioral signals such as tool switching, protocol transitions, or pivot patterns correspond to psychologically meaningful decision points. The results demonstrate that LLMs can bridge the semantic gap between packet-level logs and strategic intent, offering a pathway toward cognitive-adaptive cyber defense. Keywords: Cognitive Cybersecurity, Large Language Models (LLMs), Cyberpsychology, Intrusion Detection Systems (IDS), MITRE ATT&CK, Cognitive Biases

CRAug 18, 2025
Quantifying Loss Aversion in Cyber Adversaries via LLM Analysis

Soham Hans, Nikolos Gurney, Stacy Marsella et al.

Understanding and quantifying human cognitive biases from empirical data has long posed a formidable challenge, particularly in cybersecurity, where defending against unknown adversaries is paramount. Traditional cyber defense strategies have largely focused on fortification, while some approaches attempt to anticipate attacker strategies by mapping them to cognitive vulnerabilities, yet they fall short in dynamically interpreting attacks in progress. In recognition of this gap, IARPA's ReSCIND program seeks to infer, defend against, and even exploit attacker cognitive traits. In this paper, we present a novel methodology that leverages large language models (LLMs) to extract quantifiable insights into the cognitive bias of loss aversion from hacker behavior. Our data are collected from an experiment in which hackers were recruited to attack a controlled demonstration network. We process the hacker generated notes using LLMs using it to segment the various actions and correlate the actions to predefined persistence mechanisms used by hackers. By correlating the implementation of these mechanisms with various operational triggers, our analysis provides new insights into how loss aversion manifests in hacker decision-making. The results demonstrate that LLMs can effectively dissect and interpret nuanced behavioral patterns, thereby offering a transformative approach to enhancing cyber defense strategies through real-time, behavior-based analysis.

HCJan 7, 2022
To Trust or to Stockpile: Modeling Human-Simulation Interaction in Supply Chain Shortages

Omid Mohaddesi, Jacqueline Griffin, Ozlem Ergun et al.

Understanding decision-making in dynamic and complex settings is a challenge yet essential for preventing, mitigating, and responding to adverse events (e.g., disasters, financial crises). Simulation games have shown promise to advance our understanding of decision-making in such settings. However, an open question remains on how we extract useful information from these games. We contribute an approach to model human-simulation interaction by leveraging existing methods to characterize: (1) system states of dynamic simulation environments (with Principal Component Analysis), (2) behavioral responses from human interaction with simulation (with Hidden Markov Models), and (3) behavioral responses across system states (with Sequence Analysis). We demonstrate this approach with our game simulating drug shortages in a supply chain context. Results from our experimental study with 135 participants show different player types (hoarders, reactors, followers), how behavior changes in different system states, and how sharing information impacts behavior. We discuss how our findings challenge existing literature.

HCJul 29, 2021
Design-Driven Requirements for Computationally Co-Creative Game AI Design Tools

Nathan Partlan, Erica Kleinman, Jim Howe et al.

Game AI designers must manage complex interactions between the AI character, the game world, and the player, while achieving their design visions. Computational co-creativity tools can aid them, but first, AI and HCI researchers must gather requirements and determine design heuristics to build effective co-creative tools. In this work, we present a participatory design study that categorizes and analyzes game AI designers' workflows, goals, and expectations for such tools. We evince deep connections between game AI design and the design of co-creative tools, and present implications for future co-creativity tool research and development.