NCFeb 28, 2025
How Metacognitive Architectures Remember Their Own Thoughts: A Systematic ReviewRobin Nolte, Mihai Pomarlan, Ayden Janssen et al.
Background: Metacognition has gained significant attention for its potential to enhance autonomy and adaptability of artificial agents but remains a fragmented field: diverse theories, terminologies, and design choices have led to disjointed developments and limited comparability across systems. Existing overviews remain at a conceptual level that is undiscerning to the underlying algorithms, representations, and their respective success. Methods: We address this gap by performing an explorative systematic review. Reports were included if they described techniques enabling Computational Metacognitive Architectures (CMAs) to model, store, remember, and process their episodic metacognitive experiences, one of Flavell's (1979a) three foundational components of metacognition. Searches were conducted in 16 databases, consulted between December 2023 and June 2024. Data were extracted using a 20-item framework considering pertinent aspects. Results: A total of 101 reports on 35 distinct CMAs were included. Our findings show that metacognitive experiences may boost system performance and explainability, e.g., via self-repair. However, lack of standardization and limited evaluations may hinder progress: only 17% of CMAs were quantitatively evaluated regarding this review's focus, and significant terminological inconsistency limits cross-architecture synthesis. Systems also varied widely in memory content, data types, and employed algorithms. Discussion: Limitations include the non-iterative nature of the search query, heterogeneous data availability, and an under-representation of emergent, sub-symbolic CMAs. Future research should focus on standardization and evaluation, e.g., via community-driven challenges, and on transferring promising principles to emergent architectures.
HCMar 5
Not All Trust is the Same: Effects of Decision Workflow and Explanations in Human-AI Decision MakingLaura Spillner, Rachel Ringe, Robert Porzel et al.
A central challenge in AI-assisted decision making is achieving warranted, well-calibrated trust. Both overtrust (accepting incorrect AI recommendations) and undertrust (rejecting correct advice) should be prevented. Prior studies differ in the design of the decision workflow - whether users see the AI suggestion immediately (1-step setup) or have to submit a first decision beforehand (2-step setup) -, and in how trust is measured - through self-reports or as behavioral trust, that is, reliance. We examined the effects and interactions of (a) the type of decision workflow, (b) the presence of explanations, and (c) users' domain knowledge and prior AI experience. We compared reported trust, reliance (agreement rate and switch rate), and overreliance. Results showed no evidence that a 2-step setup reduces overreliance. The decision workflow also did not directly affect self-reported trust, but there was a crossover interaction effect with domain knowledge and explanations, suggesting that the effects of explanations alone may not generalize across workflow setups. Finally, our findings confirm that reported trust and reliance behavior are distinct constructs that should be evaluated separately in AI-assisted decision making.
HCAug 11, 2025
Can AI Explanations Make You Change Your Mind?Laura Spillner, Rachel Ringe, Robert Porzel et al.
In the context of AI-based decision support systems, explanations can help users to judge when to trust the AI's suggestion, and when to question it. In this way, human oversight can prevent AI errors and biased decision-making. However, this rests on the assumption that users will consider explanations in enough detail to be able to catch such errors. We conducted an online study on trust in explainable DSS, and were surprised to find that in many cases, participants spent little time on the explanation and did not always consider it in detail. We present an exploratory analysis of this data, investigating what factors impact how carefully study participants consider AI explanations, and how this in turn impacts whether they are open to changing their mind based on what the AI suggests.
AIJul 29, 2025
Finding Uncommon Ground: A Human-Centered Model for Extrospective ExplanationsLaura Spillner, Nima Zargham, Mihai Pomarlan et al.
The need for explanations in AI has, by and large, been driven by the desire to increase the transparency of black-box machine learning models. However, such explanations, which focus on the internal mechanisms that lead to a specific output, are often unsuitable for non-experts. To facilitate a human-centered perspective on AI explanations, agents need to focus on individuals and their preferences as well as the context in which the explanations are given. This paper proposes a personalized approach to explanation, where the agent tailors the information provided to the user based on what is most likely pertinent to them. We propose a model of the agent's worldview that also serves as a personal and dynamic memory of its previous interactions with the same user, based on which the artificial agent can estimate what part of its knowledge is most likely new information to the user.
HCApr 24, 2021
Towards Low-burden Responses to Open Questions in VRDmitry Alexandrovsky, Susanne Putze, Alexander Schülke et al.
Subjective self-reports in VR user studies is a burdening and often tedious task for the participants. To minimize the disruption with the ongoing experience VR research has started to administer the surveying directly inside the virtual environments. However, due to the tedious nature of text-entry in VR, most VR surveying tools focus on closed questions with predetermined responses, while open questions with free-text responses remain unexplored. This neglects a crucial part of UX research. To provide guidance on suitable self-reporting methods for open questions in VR user studies, this position paper presents a comparative study with three text-entry methods in VR and outlines future directions towards low-burden qualitative responding.
HCJan 16, 2021
Evaluating User Experiences in Mixed RealityDmitry Alexandrovsky, Susanne Putze, Valentin Schwind et al.
Measure user experience in MR (i.e., AR/VR) user studies is essential. Researchers apply a wide range of measuring methods using objective (e.g., biosignals, time logging), behavioral (e.g., gaze direction, movement amplitude), and subjective (e.g., standardized questionnaires) metrics. Many of these measurement instruments were adapted from use-cases outside of MR but have not been validated for usage in MR experiments. However, researchers are faced with various challenges and design alternatives when measuring immersive experiences. These challenges become even more diverse when running out-of-the lab studies. Measurement methods of VR experience recently received much attention. For example, research has started embedding questionnaires in the VE for various applications, allowing users to stay closer to the ongoing experience while filling out the survey. However, there is a diversity in the interaction methods and practices on how the assessment procedure is conducted. This diversity in methods underlines a missing shared agreement of standardized measurement tools for VR experiences. AR research strongly orients on the research methods from VR, e.g., using the same type of subjective questionnaires. However, some crucial technical differences require careful considerations during the evaluation. This workshop at CHI 2021 provides a foundation to exchange expertise and address challenges and opportunities of research methods in MR user studies. By this, our workshop launches a discussion of research methods that should lead to standardizing assessment methods in MR user studies. The outcomes of the workshop will be aggregated into a collective special issue journal article.
RONov 24, 2020
Foundations of the Socio-physical Model of Activities (SOMA) for Autonomous Robotic AgentsDaniel Beßler, Robert Porzel, Mihai Pomarlan et al.
In this paper, we present foundations of the Socio-physical Model of Activities (SOMA). SOMA represents both the physical as well as the social context of everyday activities. Such tasks seem to be trivial for humans, however, they pose severe problems for artificial agents. For starters, a natural language command requesting something will leave many pieces of information necessary for performing the task unspecified. Humans can solve such problems fast as we reduce the search space by recourse to prior knowledge such as a connected collection of plans that describe how certain goals can be achieved at various levels of abstraction. Rather than enumerating fine-grained physical contexts SOMA sets out to include socially constructed knowledge about the functions of actions to achieve a variety of goals or the roles objects can play in a given situation. As the human cognition system is capable of generalizing experiences into abstract knowledge pieces applicable to novel situations, we argue that both physical and social context need be modeled to tackle these challenges in a general manner. This is represented by the link between the physical and social context in SOMA where relationships are established between occurrences and generalizations of them, which has been demonstrated in several use cases that validate SOMA.
HCJun 6, 2020
Towards Generating Virtual Movement from Textual Instructions A Case Study in Quality AssessmentHimangshu Sarma, Robert Porzel, Jan Smeddinck et al.
Many application areas ranging from serious games for health to learning by demonstration in robotics, could benefit from large body movement datasets extracted from textual instructions accompanied by images. The interpretation of instructions for the automatic generation of the corresponding motions (e.g. exercises) and the validation of these movements are difficult tasks. In this article we describe a first step towards achieving automated extraction. We have recorded five different exercises in random order with the help of seven amateur performers using a Kinect. During the recording, we found that the same exercise was interpreted differently by each human performer even though they were given identical textual instructions. We performed a quality assessment study based on that data using a crowdsourcing approach and tested the inter-rater agreement for different types of visualizations, where the RGBbased visualization showed the best agreement among the annotators.