LGJan 24, 2023
Investigating Labeler Bias in Face Annotation for Machine LearningLuke Haliburton, Sinksar Ghebremedhin, Robin Welsch et al. · cmu
In a world increasingly reliant on artificial intelligence, it is more important than ever to consider the ethical implications of artificial intelligence on humanity. One key under-explored challenge is labeler bias, which can create inherently biased datasets for training and subsequently lead to inaccurate or unfair decisions in healthcare, employment, education, and law enforcement. Hence, we conducted a study to investigate and measure the existence of labeler bias using images of people from different ethnicities and sexes in a labeling task. Our results show that participants possess stereotypes that influence their decision-making process and that labeler demographics impact assigned labels. We also discuss how labeler bias influences datasets and, subsequently, the models trained on them. Overall, a high degree of transparency must be maintained throughout the entire artificial intelligence training process to identify and correct biases in the data as early as possible.
HCFeb 11, 2023
The Impact of Expertise in the Loop for Exploring Machine RationalityChangkun Ou, Sven Mayer, Andreas Butz
Human-in-the-loop optimization utilizes human expertise to guide machine optimizers iteratively and search for an optimal solution in a solution space. While prior empirical studies mainly investigated novices, we analyzed the impact of the levels of expertise on the outcome quality and corresponding subjective satisfaction. We conducted a study (N=60) in text, photo, and 3D mesh optimization contexts. We found that novices can achieve an expert level of quality performance, but participants with higher expertise led to more optimization iteration with more explicit preference while keeping satisfaction low. In contrast, novices were more easily satisfied and terminated faster. Therefore, we identified that experts seek more diverse outcomes while the machine reaches optimal results, and the observed behavior can be used as a performance indicator for human-in-the-loop system designers to improve underlying models. We inform future research to be cautious about the impact of user expertise when designing human-in-the-loop systems.
HCMay 1
Toward a Unified Framework for Collaborative Design of Human-AI InteractionAnkur Bhatt, Sven Mayer
Human computer interaction is shifting from screen-based systems to multimodal interfaces where artificial intelligence powered systems increasingly interpret user intent through speech, gesture, and gaze. Yet users rarely understand how these interpretations are made, compromising trust and control. Existing approaches treat multimodal alignment, explainability, and human agency as separate concerns, leaving critical gaps in transparency and user oversight. We propose a Human Artificial Intelligence collaboration framework integrating these three principles as interdependent design requirements: 1) multimodal alignment for accurate intent interpretation, 2) interaction centric explainability delivering real time visual, textual, and audio feedback, and 3) agency preserving mechanisms enabling users to accept, reject, or modify artificial intelligence suggestions at any time. We presented the framework through two scenarios, collaborative design and extended reality warehouse robot collaboration, chosen to span differences in time pressure and error reversibility, with the latter situated in a domain where misinterpretation carries documented safety consequences. This approach reframes collaboration as a continuous interaction property, benefiting designers, researchers, and end users by ensuring that as artificial intelligence systems grow more proactive, user understanding and control remain first class design properties.
HCMar 6
Non-urgent Messages Do Not Jump into My Headset Suddenly! Adaptive Notification Design in Mixed RealityJingyao Zheng, Xian Wang, Sven Mayer et al.
Mixed reality (MR) notification systems currently display all messages in fixed central locations regardless of urgency, leading to unnecessary interruptions and cognitive overload. Drawing from previous MR/Virtual Reality (VR) notification design work and calm technology principles, we developed an adaptive notification system that adjusts spatial placement based on urgency levels: non-urgent notifications appear as peripheral icons accessible via head movement, moderately urgent messages anchor to the user's hand, and very urgent notifications transition progressively from peripheral to central view. Through a within-subjects study (N=18), we evaluated our adaptive system against the default centralised approach. Results demonstrate that the adaptive system significantly reduces mental workload (p=0.041), temporal workload (p=0.008), and frustration (p=0.004) while maintaining comparable notification awareness. Logistic regression analysis reveals that users prefer the adaptive system even with classification errors, provided the combined misclassification rate (disruptiveness + omission errors) remains below a determinable threshold. Our findings establish the first empirical evidence that urgency-based spatial notification distribution effectively addresses core MR usability challenges, offering practical design guidelines for immersive notification systems that balance user attention management with information accessibility.
CVAug 17, 2021
Neural Photofit: Gaze-based Mental Image ReconstructionFlorian Strohm, Ekta Sood, Sven Mayer et al.
We propose a novel method that leverages human fixations to visually decode the image a person has in mind into a photofit (facial composite). Our method combines three neural networks: An encoder, a scoring network, and a decoder. The encoder extracts image features and predicts a neural activation map for each face looked at by a human observer. A neural scoring network compares the human and neural attention and predicts a relevance score for each extracted image feature. Finally, image features are aggregated into a single feature vector as a linear combination of all features weighted by relevance which a decoder decodes into the final photofit. We train the neural scoring network on a novel dataset containing gaze data of 19 participants looking at collages of synthetic faces. We show that our method significantly outperforms a mean baseline predictor and report on a human study that shows that we can decode photofits that are visually plausible and close to the observer's mental image.