36.6HCMay 27
Fostering human learning is crucial for boosting human-AI synergyJulian Berger, Jason W. Burton, Ralph Hertwig et al.
The collaboration between humans and artificial intelligence (AI) holds the promise of achieving superior outcomes compared to either acting alone-a phenomenon called human-AI synergy. Nevertheless, our understanding of the conditions that facilitate such human-AI synergy when humans are advised by AI remains limited. A recent meta-analysis showed that, on average, human-AI combinations do not outperform the better individual agent. We argue that this pessimistic conclusion arises from insufficient attention to human learning in the experimental designs. To substantiate this claim, we re-analyzed all 74 studies included in the original meta-analysis, yielding two new findings. First, most previous research overlooked design features that foster human learning, such as providing outcome feedback to participants. Second, our re-analysis demonstrated that studies providing outcome feedback show tentatively higher synergy than those without outcome feedback. Crucially, feedback paired with AI explanations tends to yield positive synergy, while explanations without feedback were linked to negative synergy-indicating that explanations increase synergy only when humans can learn to verify the AI's reliability through feedback. We conclude that the current literature underestimates the potential of human-AI collaboration because it predominantly relies on paradigms that do not facilitate human learning, thus hindering humans from effectively adapting their collaboration strategies. We therefore advocate for a paradigm shift in human-AI interaction research that explicitly addresses human learning and thus enhances our understanding of and support for successful human-AI collaboration.
87.5HCMay 20
Stable Personas: Dual-Assessment of Temporal Stability in LLM-Based Human SimulationJana Gonnermann-Müller, Jennifer Haase, Nicolas Leins et al.
Large Language Models (LLMs) acting as artificial agents offer the potential for scalable behavioral research, yet their validity depends on whether LLMs can maintain stable personas across extended conversations. We address this point using a dual-assessment framework measuring both self-reported characteristics and observer-rated persona expression. Across two experiments testing four persona conditions (default, high, moderate, and low ADHD presentations), seven LLMs, and three semantically equivalent persona prompts, we examine between-conversation stability (3,473 conversations) and within-conversation stability (1,370 conversations and 18 turns). Self-reports remain highly stable both between and within conversations. However, observer ratings reveal a tendency for persona expressions to decline during extended conversations. These findings suggest that persona-instructed LLMs produce stable, persona-aligned self-reports, an important prerequisite for behavioral research, while identifying this regression tendency as a boundary condition for multi-agent social simulation.
HCSep 28, 2023
"AI enhances our performance, I have no doubt this one will do the same": The Placebo effect is robust to negative descriptions of AIAgnes M. Kloft, Robin Welsch, Thomas Kosch et al.
Heightened AI expectations facilitate performance in human-AI interactions through placebo effects. While lowering expectations to control for placebo effects is advisable, overly negative expectations could induce nocebo effects. In a letter discrimination task, we informed participants that an AI would either increase or decrease their performance by adapting the interface, but in reality, no AI was present in any condition. A Bayesian analysis showed that participants had high expectations and performed descriptively better irrespective of the AI description when a sham-AI was present. Using cognitive modeling, we could trace this advantage back to participants gathering more information. A replication study verified that negative AI descriptions do not alter expectations, suggesting that performance expectations with AI are biased and robust to negative verbal descriptions. We discuss the impact of user expectations on AI interactions and evaluation and provide a behavioral placebo marker for human-AI interaction
64.9HCMay 25
Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and MetacognitionDaniela Fernandes, Daniel Buschek, Lev Tankelevitch et al.
Large Language Model interfaces are increasingly verbose, exposing intermediate reasoning traces alongside final answers. Traces are framed as transparency mechanisms, yet it is unclear how people use them to solve problems. We report a preregistered between-subjects study (N = 559) in which participants solved ten LSAT-style reasoning problems under one of three conditions: an Answer-only baseline, a Full-trace revealed before the answer, and a Summary-trace presented alongside the answer. Summaries preserved task performance at the no-trace baseline while significantly elevating trust and hedonic appeal, establishing that trace exposure shifts subjective appraisal of the interaction without bringing performance benefits. Under an open-weight reasoning model exposing verbose intermediate output, full traces additionally impaired performance relative to the answer-only baseline. Across all conditions, participants substantially overestimated their performance, and no trace format supported calibrated self-evaluation. Further analysis indicates that hedonic appeal, not trust, carries the indirect path to overestimation, consistent with a processing-fluency account. Reasoning traces are best understood as user-facing interface artifacts rather than transparent windows into model cognition, and calibration is unlikely to emerge from the traces themselves and may best be scaffolded by interactions that elicit users' own reasoning first.
AIJan 29
Within-Model vs Between-Prompt Variability in Large Language Models for Creative TasksJennifer Haase, Jana Gonnermann-Müller, Paul H. P. Hanel et al.
How much of LLM output variance is explained by prompts versus model choice versus stochasticity through sampling? We answer this by evaluating 12 LLMs on 10 creativity prompts with 100 samples each (N = 12,000). For output quality (originality), prompts explain 36.43% of variance, comparable to model choice (40.94%). But for output quantity (fluency), model choice (51.25%) and within-LLM variance (33.70%) dominate, with prompts explaining only 4.22%. Prompts are powerful levers for steering output quality, but given the substantial within-LLM variance (10-34%), single-sample evaluations risk conflating sampling noise with genuine prompt or model effects.
DCNov 3, 2023
Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPTMario Sänger, Ninon De Mecquenem, Katarzyna Ewa Lewińska et al.
Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages. To address these challenges, we investigate the efficiency of Large Language Models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed three user studies in two scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.
78.8HCMay 7
LLM-Based Educational Simulation: Evaluating Temporal Student Persona Stability Across ADHD ProfilesJana Gonnermann-Müller, Jennifer Haase, Nicolas Leins et al.
Student simulation with Large language models (LLMs) offers a scalable alternative for educational research and teacher training. Yet, its validity depends on whether models maintain stable personas across extended interactions. We test this prerequisite using a dual-assessment framework measuring self-reported characteristics and observer-rated behavioral expressions. Across two experiments testing four clinically-grounded ADHD persona conditions, five LLMs, and three prompt designs, we quantify between-conversation stability (N=4,968) and within-conversation stability (N=3,952 across 9 turns). Self-reported characteristics remain stable for high intensities, constituting a necessary prerequisite for valid behavioral simulation. Observer-rated behavioral expression reveals selective instability: within-conversation drift occurs in unscripted dialog for high and moderate ADHD personas. Scripted interactions with explicit task prompts eliminate this drift entirely. Stable, persona-aligned simulated learners benefit from a structured interaction design to maintain behavioral coherence, which holds significant implications for teacher training, adaptive tutoring, and any application requiring sustained, path-dependent learner interactions.
HCJan 28, 2024
HappyRouting: Learning Emotion-Aware Route Trajectories for Scalable In-The-Wild NavigationDavid Bethge, Daniel Bulanda, Adam Kozlowski et al.
Routes represent an integral part of triggering emotions in drivers. Navigation systems allow users to choose a navigation strategy, such as the fastest or shortest route. However, they do not consider the driver's emotional well-being. We present HappyRouting, a novel navigation-based empathic car interface guiding drivers through real-world traffic while evoking positive emotions. We propose design considerations, derive a technical architecture, and implement a routing optimization framework. Our contribution is a machine learning-based generated emotion map layer, predicting emotions along routes based on static and dynamic contextual data. We evaluated HappyRouting in a real-world driving study (N=13), finding that happy routes increase subjectively perceived valence by 11% (p=.007). Although happy routes take 1.25 times longer on average, participants perceived the happy route as shorter, presenting an emotion-enhanced alternative to today's fastest routing mechanisms. We discuss how emotion-based routing can be integrated into navigation apps, promoting emotional well-being for mobility use.
HCJan 4, 2021
Supporting Musical Practice Sessions Through HMD-Based Augmented RealityKarola Marky, Andreas Weiß, Thomas Kosch
Learning a musical instrument requires a lot of practice, which ideally, should be done every day. During practice sessions, students are on their own in the overwhelming majority of the time, but access to experts that support students "just-in-time" is limited. Therefore, students commonly do not receive any feedback during their practice sessions. Adequate feedback, especially for beginners, is highly important for three particular reasons: (1) preventing the acquirement of wrong motions, (2) avoiding frustration due to a steep learning curve, and (3) potential health problems that arise from straining muscles or joints harmfully. In this paper, we envision the usage of head-mounted displays as assistance modality to support musical instrument learning. We propose a modular concept for several assistance modes to help students during their practice sessions. Finally, we discuss hardware requirements and implementations to realize the proposed concepts.
HCDec 20, 2020
Enabling Tangible Interaction through Detection and Augmentation of Everyday ObjectsThomas Kosch, Albrecht Schmidt
Digital interaction with everyday objects has become popular since the proliferation of camera-based systems that detect and augment objects "just-in-time". Common systems use a vision-based approach to detect objects and display their functionalities to the user. Sensors, such as color and depth cameras, have become inexpensive and allow seamless environmental tracking in mobile as well as stationary settings. However, object detection in different contexts faces challenges as it highly depends on environmental parameters and the conditions of the object itself. In this work, we present three tracking algorithms which we have employed in past research projects to track and recognize objects. We show, how mobile and stationary augmented reality can be used to extend the functionalities of objects. We conclude, how common items can provide user-defined tangible interaction beyond their regular functionality.
HCOct 20, 2020
Don't Drone Yourself in Work: Discussing DronOS as a Framework for Human-Drone InteractionMatthias Hoppe, Yannick Weiß, Marinus Burger et al.
More and more off-the-shelf drones provide frameworks that enable the programming of flight paths. These frameworks provide vendor-dependent programming and communication interfaces that are intended for flight path definitions. However, they are often limited to outdoor and GPS-based use only. A key disadvantage of such a solution is that they are complicated to use and require readjustments when changing the drone model. This is time-consuming since it requires redefining the flight path for the new framework. This workshop paper proposes additional features for DronOS, a community-driven framework that enables model-independent automatisation and programming of drones. We enhanced DronOS to include additional functions to account for the specific design constraints in human-drone-interaction. This paper provides a starting point for discussing the requirements involved in designing a drone system with other researchers within the human-drone interaction community. We envision DronOS as a community-driven framework that can be applied to generic drone models, hence enabling the automatisation for any commercially available drone. Our goal is to build DronOS as a software tool that can be easily used by researchers and practitioners to prototype novel drone-based systems.
HCOct 15, 2020
Workload-Aware Systems and Interfaces for Cognitive AugmentationThomas Kosch
In today's society, our cognition is constantly influenced by information intake, attention switching, and task interruptions. This increases the difficulty of a given task, adding to the existing workload and leading to compromised cognitive performances. The human body expresses the use of cognitive resources through physiological responses when confronted with a plethora of cognitive workload. This temporarily mobilizes additional resources to deal with the workload at the cost of accelerated mental exhaustion. We predict that recent developments in physiological sensing will increasingly create user interfaces that are aware of the user's cognitive capacities, hence able to intervene when high or low states of cognitive workload are detected. Subsequently, we investigate suitable feedback modalities in a user-centric design process which are desirable for cognitive assistance. We then investigate different physiological sensing modalities to enable suitable real-time assessments of cognitive workload. We provide evidence that the human brain and eye gaze are sensitive to fluctuations in cognitive resting states. We show that electroencephalography and eye tracking are reliable modalities to assess mental workload during user interface operation. In the end, we present applications that regulate cognitive workload in home and work setting, investigate how cognitive workload can be visualized to the user, and show how cognitive workload measurements can be used to predict the efficiency of information intake through reading interfaces. Finally, we present our vision of future workload-aware interfaces. Previous interfaces were limited in their ability to utilize cognitive workload for user interaction. Together with the collected data sets, this thesis paves the way for methodical and technical tools that integrate workload-awareness as a factor for context-aware systems.