Caglar Yildirim

HC
h-index2
6papers
94citations
Novelty31%
AI Score37

6 Papers

AIMar 17Code
Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

Caglar Yildirim

Large language models (LLMs) are increasingly deployed as tool-using agents, shifting safety concerns from harmful text generation to harmful task completion. Deployed systems often condition on user profiles or persistent memory, yet agent safety evaluations typically ignore personalization signals. To address this gap, we investigated how mental health disclosure, a sensitive and realistic user-context cue, affects harmful behavior in agentic settings. Building on the AgentHarm benchmark, we evaluated frontier and open-source LLMs on multi-step malicious tasks (and their benign counterparts) under controlled prompt conditions that vary user-context personalization (no bio, bio-only, bio+mental health disclosure) and include a lightweight jailbreak injection. Our results reveal that harmful task completion is non-trivial across models: frontier lab models (e.g., GPT 5.2, Claude Sonnet 4.5, Gemini 3-Pro) still complete a measurable fraction of harmful tasks, while an open model (DeepSeek 3.2) exhibits substantially higher harmful completion. Adding a bio-only context generally reduces harm scores and increases refusals. Adding an explicit mental health disclosure often shifts outcomes further in the same direction, though effects are modest and not uniformly reliable after multiple-testing correction. Importantly, the refusal increase also appears on benign tasks, indicating a safety--utility trade-off via over-refusal. Finally, jailbreak prompting sharply elevates harm relative to benign conditions and can weaken or override the protective shift induced by personalization. Taken together, our results indicate that personalization can act as a weak protective factor in agentic misuse settings, but it is fragile under minimal adversarial pressure, highlighting the need for personalization-aware evaluations and safeguards that remain robust across user-context conditions.

HCMar 14, 2025
Conversational AI as a Coding Assistant: Understanding Programmers' Interactions with and Expectations from Large Language Models for Coding

Mehmet Akhoroz, Caglar Yildirim

Conversational AI interfaces powered by large language models (LLMs) are increasingly used as coding assistants. However, questions remain about how programmers interact with LLM-based conversational agents, the challenges they encounter, and the factors influencing adoption. This study investigates programmers' usage patterns, perceptions, and interaction strategies when engaging with LLM-driven coding assistants. Through a survey, participants reported both the benefits, such as efficiency and clarity of explanations, and the limitations, including inaccuracies, lack of contextual awareness, and concerns about over-reliance. Notably, some programmers actively avoid LLMs due to a preference for independent learning, distrust in AI-generated code, and ethical considerations. Based on our findings, we propose design guidelines for improving conversational coding assistants, emphasizing context retention, transparency, multimodal support, and adaptability to user preferences. These insights contribute to the broader understanding of how LLM-based conversational agents can be effectively integrated into software development workflows while addressing adoption barriers and enhancing usability.

HCMay 22, 2021
The Efficacy of a Virtual Reality-Based Mindfulness Intervention

Caglar Yildirim, Tara OGrady

Mindfulness can be defined as increased awareness of and sustained attentiveness to the present moment. Recently, there has been a growing interest in the applications of mindfulness for empirical research in wellbeing and the use of virtual reality (VR) environments and 3D interfaces as a conduit for mindfulness training. Accordingly, the current experiment investigated whether a brief VR-based mindfulness intervention could induce a greater level of state mindfulness, when compared to an audio-based intervention and control group. Results indicated two mindfulness interventions, VR-based and audio-based, induced a greater state of mindfulness, compared to the control group. Participants in the VR-based mindfulness intervention group reported a greater state of mindfulness than those in the guided audio group, indicating the immersive mindfulness intervention was more robust. Collectively, these results provide empirical support for the efficaciousness of a brief VR-based mindfulness intervention in inducing a robust state of mindfulness in laboratory settings.

HCMay 22, 2021
Effects of VR Gaming and Game Genre on Player Experience

Michael Carroll, Ethan Osborne, Caglar Yildirim

With the increasing availability of modern virtual reality (VR) headsets, the use and applications of VR technology for gaming purposes have become more pervasive than ever. Despite the growing popularity of VR gaming, user studies into how it might affect the player experience (PX) during the gameplay are scarce. Accordingly, the current study investigated the effects of VR gaming and game genre on PX. We compared PX metrics for two game genres, strategy (more interactive) and racing (less interactive), across two gaming platforms, VR and traditional desktop gaming. Participants were randomly assigned to one of the gaming platforms, played both a strategy and racing game on their corresponding platform, and provided PX ratings. Results revealed that, regardless of the game genre, participants in the VR gaming condition experienced a greater level of sense of presence than did those in the desktop gaming condition. That said, results showed that the two gaming platforms did not significantly differ from one another in PX ratings. As for the effect of game genre, participants provided greater PX ratings for the strategy game than for the racing game, regardless of whether the game was played on a VR headset or desktop computer. Collectively, these results indicate that although VR gaming affords a greater sense of presence in the game environment, this increase in presence does not seem to translate into a more satisfactory PX when playing either a strategy or racing game.

HCDec 1, 2020
A Review of Deep Learning Approaches to EEG-Based Classification of Cybersickness in Virtual Reality

Caglar Yildirim

Cybersickness is an unpleasant side effect of exposure to a virtual reality (VR) experience and refers to such physiological repercussions as nausea and dizziness triggered in response to VR exposure. Given the debilitating effect of cybersickness on the user experience in VR, academic interest in the automatic detection of cybersickness from physiological measurements has crested in recent years. Electroencephalography (EEG) has been extensively used to capture changes in electrical activity in the brain and to automatically classify cybersickness from brainwaves using a variety of machine learning algorithms. Recent advances in deep learning (DL) algorithms and increasing availability of computational resources for DL have paved the way for a new area of research into the application of DL frameworks to EEG-based detection of cybersickness. Accordingly, this review involved a systematic review of the peer-reviewed papers concerned with the application of DL frameworks to the classification of cybersickness from EEG signals. The relevant literature was identified through exhaustive database searches, and the papers were scrutinized with respect to experimental protocols for data collection, data preprocessing, and DL architectures. The review revealed a limited number of studies in this nascent area of research and showed that the DL frameworks reported in these studies (i.e., DNN, CNN, and RNN) could classify cybersickness with an average accuracy rate of 93%. This review provides a summary of the trends and issues in the application of DL frameworks to the EEG-based detection of cybersickness, with some guidelines for future research.

HCMay 12, 2020
Two Dimensions for Organizing Immersive Analytics: Toward a Taxonomy for Facet and Position

David Saffo, Sara Di Bartolomeo, Caglar Yildirim et al.

As immersive analytics continues to grow as a discipline, so too should its underlying methodological support. Taxonomies play an important role for information visualization and human computer interaction. They provide an organization of the techniques used in a particular domain that better enable researchers to describe their work, discover existing methods, and identify gaps in the literature. Existing taxonomies in related fields do not capture or describe the unique paradigms employed in immersive analytics. We conceptualize a taxonomy that organizes immersive analytics according to two dimensions: spatial and visual presentation. Each intersection of this taxonomy represents a unique design paradigm which, when thoroughly explored, can aid in the design and research of new immersive analytic applications.