AINov 15, 2023
Forms of Understanding for XAI-ExplanationsHendrik Buschmeier, Heike M. Buhl, Friederike Kern et al.
Explainability has become an important topic in computer science and artificial intelligence, leading to a subfield called Explainable Artificial Intelligence (XAI). The goal of providing or seeking explanations is to achieve (better) 'understanding' on the part of the explainee. However, what it means to 'understand' is still not clearly defined, and the concept itself is rarely the subject of scientific investigation. This conceptual article aims to present a model of forms of understanding for XAI-explanations and beyond. From an interdisciplinary perspective bringing together computer science, linguistics, sociology, philosophy and psychology, a definition of understanding and its forms, assessment, and dynamics during the process of giving everyday explanations are explored. Two types of understanding are considered as possible outcomes of explanations, namely enabledness, 'knowing how' to do or decide something, and comprehension, 'knowing that' -- both in different degrees (from shallow to deep). Explanations regularly start with shallow understanding in a specific domain and can lead to deep comprehension and enabledness of the explanandum, which we see as a prerequisite for human users to gain agency. In this process, the increase of comprehension and enabledness are highly interdependent. Against the background of this systematization, special challenges of understanding in XAI are discussed.
HCMar 20
Sense4HRI: A ROS 2 HRI Framework for Physiological Sensor Integration and Synchronized LoggingManuel Scheibl, Julian Leichert, Sinem Görmez et al.
Physiological signals are increasingly relevant to estimate the mental states of users in human-robot interaction (HRI), yet ROS 2-based HRI frameworks still lack reusable support to integrate such data streams in a standardized way. Therefore, we propose Sense4HRI, an adapted framework for human-robot interaction in ROS 2 that integrates physiological measurements and derived user-state indicators. The framework is designed to be extensible, allowing the integration of additional physiological sensors, their interpretation, and multimodal fusion to provide a robust assessment of the mental states of users. In addition, it introduces reusable interfaces for timestamped physiological time-series data and supports synchronized logging of physiological signals together with experiment context, enabling interoperable and traceable multimodal analysis within ROS 2-based HRI systems.
ROMar 31, 2025
Towards a cognitive architecture to enable natural language interaction in co-constructive task learningManuel Scheibl, Birte Richter, Alissa Müller et al.
This research addresses the question, which characteristics a cognitive architecture must have to leverage the benefits of natural language in Co-Constructive Task Learning (CCTL). To provide context, we first discuss Interactive Task Learning (ITL), the mechanisms of the human memory system, and the significance of natural language and multi-modality. Next, we examine the current state of cognitive architectures, analyzing their capabilities to inform a concept of CCTL grounded in multiple sources. We then integrate insights from various research domains to develop a unified framework. Finally, we conclude by identifying the remaining challenges and requirements necessary to achieve CCTL in Human-Robot Interaction (HRI).
ROMay 24, 2023
From Interactive to Co-Constructive Task LearningAnna-Lisa Vollmer, Daniel Leidner, Michael Beetz et al.
Humans have developed the capability to teach relevant aspects of new or adapted tasks to a social peer with very few task demonstrations by making use of scaffolding strategies that leverage prior knowledge and importantly prior joint experience to yield a joint understanding and a joint execution of the required steps to solve the task. This process has been discovered and analyzed in parent-infant interaction and constitutes a ``co-construction'' as it allows both, the teacher and the learner, to jointly contribute to the task. We propose to focus research in robot interactive learning on this co-construction process to enable robots to learn from non-expert users in everyday situations. In the following, we will review current proposals for interactive task learning and discuss their main contributions with respect to the entailing interaction. We then discuss our notion of co-construction and summarize research insights from adult-child and human-robot interactions to elucidate its nature in more detail. From this overview we finally derive research desiderata that entail the dimensions architecture, representation, interaction and explainability.
ROAug 26, 2021
Improving HRI through robot architecture transparencyLukas Hindemith, Anna-Lisa Vollmer, Christiane B. Wiebel-Herboth et al.
In recent years, an increased effort has been invested to improve the capabilities of robots. Nevertheless, human-robot interaction remains a complex field of application where errors occur frequently. The reasons for these errors can primarily be divided into two classes. Foremost, the recent increase in capabilities also widened possible sources of errors on the robot's side. This entails problems in the perception of the world, but also faulty behavior, based on errors in the system. Apart from that, non-expert users frequently have incorrect assumptions about the functionality and limitations of a robotic system. This leads to incompatibilities between the user's behavior and the functioning of the robot's system, causing problems on the robot's side and in the human-robot interaction. While engineers constantly improve the reliability of robots, the user's understanding about robots and their limitations have to be addressed as well. In this work, we investigate ways to improve the understanding about robots. For this, we employ FAMILIAR - FunctionAl user Mental model by Increased LegIbility ARchitecture, a transparent robot architecture with regard to the robot behavior and decision-making process. We conducted an online simulation user study to evaluate two complementary approaches to convey and increase the knowledge about this architecture to non-expert users: a dynamic visualization of the system's processes as well as a visual programming interface. The results of this study reveal that visual programming improves knowledge about the architecture. Furthermore, we show that with increased knowledge about the control architecture of the robot, users were significantly better in reaching the interaction goal. Furthermore, we showed that anthropomorphism may reduce interaction success.
RONov 5, 2020
Why robots should be technical: Correcting mental models through technical architecture conceptsLukas Hindemith, Anna-Lisa Vollmer, Jan Phillip Göpfert et al.
Research in social robotics is commonly focused on designing robots that imitate human behavior. While this might increase a user's satisfaction and acceptance of robots at first glance, it does not automatically aid a non-expert user in naturally interacting with robots, and might actually hurt their ability to correctly anticipate a robot's capabilities. We argue that a faulty mental model, that the user has of the robot, is one of the main sources of confusion. In this work we investigate how communicating technical concepts of robotic systems to users affects their mental models, and how this can increase the quality of human-robot interaction. We conducted an online study and investigated possible ways of improving users' mental models. Our results underline that communicating technical concepts can form an improved mental model. Consequently, we show the importance of consciously designing robots that express their capabilities and limitations.
HCSep 30, 2017
Confirmation detection in human-agent interaction using non-lexical speech cuesMara Brandt, Britta Wrede, Franz Kummert et al.
Even if only the acoustic channel is considered, human communication is highly multi-modal. Non-lexical cues provide a variety of information such as emotion or agreement. The ability to process such cues is highly relevant for spoken dialog systems, especially in assistance systems. In this paper we focus on the recognition of non-lexical confirmations such as "mhm", as they enhance the system's ability to accurately interpret human intent in natural communication. The architecture uses a Support Vector Machine to detect confirmations based on acoustic features. In a systematic comparison, several feature sets were evaluated for their performance on a corpus of human-agent interaction in a setting with naive users including elderly and cognitively impaired people. Our results show that using stacked formants as features yield an accuracy of 84% outperforming regular formants and MFCC or pitch based features for online classification.