LGMay 4, 2022
Machine Learning Operations (MLOps): Overview, Definition, and ArchitectureDominik Kreuzberger, Niklas Kühl, Sebastian Hirschl
The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize ML products and thus many ML endeavors fail to deliver on their expectations. The paradigm of Machine Learning Operations (MLOps) addresses this issue. MLOps includes several aspects, such as best practices, sets of concepts, and development culture. However, MLOps is still a vague term and its consequences for researchers and professionals are ambiguous. To address this gap, we conduct mixed-method research, including a literature review, a tool review, and expert interviews. As a result of these investigations, we provide an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows. Furthermore, we furnish a definition of MLOps and highlight open challenges in the field. Finally, this work provides guidance for ML researchers and practitioners who want to automate and operate their ML products with a designated set of technologies.
AIFeb 4, 2023
Appropriate Reliance on AI Advice: Conceptualization and the Effect of ExplanationsMax Schemmer, Niklas Kühl, Carina Benz et al.
AI advice is becoming increasingly popular, e.g., in investment and medical treatment decisions. As this advice is typically imperfect, decision-makers have to exert discretion as to whether actually follow that advice: they have to "appropriately" rely on correct and turn down incorrect advice. However, current research on appropriate reliance still lacks a common definition as well as an operational measurement concept. Additionally, no in-depth behavioral experiments have been conducted that help understand the factors influencing this behavior. In this paper, we propose Appropriateness of Reliance (AoR) as an underlying, quantifiable two-dimensional measurement concept. We develop a research model that analyzes the effect of providing explanations for AI advice. In an experiment with 200 participants, we demonstrate how these explanations influence the AoR, and, thus, the effectiveness of AI advice. Our work contributes fundamental concepts for the analysis of reliance behavior and the purposeful design of AI advisors.
HCJul 25, 2023
The Impact of Imperfect XAI on Human-AI Decision-MakingKatelyn Morrison, Philipp Spitzer, Violet Turri et al.
Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered explainable AI (XAI) techniques have been proposed in hopes of improving decision-makers' collaboration with AI; however, these techniques are grounded in findings from previous studies that primarily focus on the impact of incorrect AI advice. Few studies acknowledge the possibility of the explanations being incorrect even if the AI advice is correct. Thus, it is crucial to understand how imperfect XAI affects human-AI decision-making. In this work, we contribute a robust, mixed-methods user study with 136 participants to evaluate how incorrect explanations influence humans' decision-making behavior in a bird species identification task, taking into account their level of expertise and an explanation's level of assertiveness. Our findings reveal the influence of imperfect XAI and humans' level of expertise on their reliance on AI and human-AI team performance. We also discuss how explanations can deceive decision-makers during human-AI collaboration. Hence, we shed light on the impacts of imperfect XAI in the field of computer-supported cooperative work and provide guidelines for designers of human-AI collaboration systems.
AIDec 22, 2022
Data-Centric Artificial IntelligenceJohannes Jakubik, Michael Vössing, Niklas Kühl et al.
Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm emphasizing that the systematic design and engineering of data is essential for building effective and efficient AI-based systems. The objective of this article is to introduce practitioners and researchers from the field of Information Systems (IS) to data-centric AI. We define relevant terms, provide key characteristics to contrast the data-centric paradigm to the model-centric one, and introduce a framework for data-centric AI. We distinguish data-centric AI from related concepts and discuss its longer-term implications for the IS community.
HCApr 14, 2022
Should I Follow AI-based Advice? Measuring Appropriate Reliance in Human-AI Decision-MakingMax Schemmer, Patrick Hemmer, Niklas Kühl et al.
Many important decisions in daily life are made with the help of advisors, e.g., decisions about medical treatments or financial investments. Whereas in the past, advice has often been received from human experts, friends, or family, advisors based on artificial intelligence (AI) have become more and more present nowadays. Typically, the advice generated by AI is judged by a human and either deemed reliable or rejected. However, recent work has shown that AI advice is not always beneficial, as humans have shown to be unable to ignore incorrect AI advice, essentially representing an over-reliance on AI. Therefore, the aspired goal should be to enable humans not to rely on AI advice blindly but rather to distinguish its quality and act upon it to make better decisions. Specifically, that means that humans should rely on the AI in the presence of correct advice and self-rely when confronted with incorrect advice, i.e., establish appropriate reliance (AR) on AI advice on a case-by-case basis. Current research lacks a metric for AR. This prevents a rigorous evaluation of factors impacting AR and hinders further development of human-AI decision-making. Therefore, based on the literature, we derive a measurement concept of AR. We propose to view AR as a two-dimensional construct that measures the ability to discriminate advice quality and behave accordingly. In this article, we derive the measurement concept, illustrate its application and outline potential future research.
HCApr 19, 2022
On the Influence of Explainable AI on Automation BiasMax Schemmer, Niklas Kühl, Carina Benz et al.
Artificial intelligence (AI) is gaining momentum, and its importance for the future of work in many areas, such as medicine and banking, is continuously rising. However, insights on the effective collaboration of humans and AI are still rare. Typically, AI supports humans in decision-making by addressing human limitations. However, it may also evoke human bias, especially in the form of automation bias as an over-reliance on AI advice. We aim to shed light on the potential to influence automation bias by explainable AI (XAI). In this pre-test, we derive a research model and describe our study design. Subsequentially, we conduct an online experiment with regard to hotel review classifications and discuss first results. We expect our research to contribute to the design and development of safe hybrid intelligence systems.
HCMay 3, 2022
On the Effect of Information Asymmetry in Human-AI TeamsPatrick Hemmer, Max Schemmer, Niklas Kühl et al.
Over the last years, the rising capabilities of artificial intelligence (AI) have improved human decision-making in many application areas. Teaming between AI and humans may even lead to complementary team performance (CTP), i.e., a level of performance beyond the ones that can be reached by AI or humans individually. Many researchers have proposed using explainable AI (XAI) to enable humans to rely on AI advice appropriately and thereby reach CTP. However, CTP is rarely demonstrated in previous work as often the focus is on the design of explainability, while a fundamental prerequisite -- the presence of complementarity potential between humans and AI -- is often neglected. Therefore, we focus on the existence of this potential for effective human-AI decision-making. Specifically, we identify information asymmetry as an essential source of complementarity potential, as in many real-world situations, humans have access to different contextual information. By conducting an online experiment, we demonstrate that humans can use such contextual information to adjust the AI's decision, finally resulting in CTP.
LGMar 17, 2023
Artificial Intelligence for Sustainability: Facilitating Sustainable Smart Product-Service Systems with Computer VisionJannis Walk, Niklas Kühl, Michael Saidani et al.
The usage and impact of deep learning for cleaner production and sustainability purposes remain little explored. This work shows how deep learning can be harnessed to increase sustainability in production and product usage. Specifically, we utilize deep learning-based computer vision to determine the wear states of products. The resulting insights serve as a basis for novel product-service systems with improved integration and result orientation. Moreover, these insights are expected to facilitate product usage improvements and R&D innovations. We demonstrate our approach on two products: machining tools and rotating X-ray anodes. From a technical standpoint, we show that it is possible to recognize the wear state of these products using deep-learning-based computer vision. In particular, we detect wear through microscopic images of the two products. We utilize a U-Net for semantic segmentation to detect wear based on pixel granularity. The resulting mean dice coefficients of 0.631 and 0.603 demonstrate the feasibility of the proposed approach. Consequently, experts can now make better decisions, for example, to improve the machining process parameters. To assess the impact of the proposed approach on environmental sustainability, we perform life cycle assessments that show gains for both products. The results indicate that the emissions of CO2 equivalents are reduced by 12% for machining tools and by 44% for rotating anodes. This work can serve as a guideline and inspire researchers and practitioners to utilize computer vision in similar scenarios to develop sustainable smart product-service systems and enable cleaner production.
HCMay 10, 2022
A Meta-Analysis of the Utility of Explainable Artificial Intelligence in Human-AI Decision-MakingMax Schemmer, Patrick Hemmer, Maximilian Nitsche et al.
Research in artificial intelligence (AI)-assisted decision-making is experiencing tremendous growth with a constantly rising number of studies evaluating the effect of AI with and without techniques from the field of explainable AI (XAI) on human decision-making performance. However, as tasks and experimental setups vary due to different objectives, some studies report improved user decision-making performance through XAI, while others report only negligible effects. Therefore, in this article, we present an initial synthesis of existing research on XAI studies using a statistical meta-analysis to derive implications across existing research. We observe a statistically positive impact of XAI on users' performance. Additionally, the first results indicate that human-AI decision-making tends to yield better task performance on text data. However, we find no effect of explanations on users' performance compared to sole AI predictions. Our initial synthesis gives rise to future research investigating the underlying causes and contributes to further developing algorithms that effectively benefit human decision-makers by providing meaningful explanations.
CVJan 23, 2023
Toward Foundation Models for Earth Monitoring: Generalizable Deep Learning Models for Natural Hazard SegmentationJohannes Jakubik, Michal Muszynski, Michael Vössing et al.
Climate change results in an increased probability of extreme weather events that put societies and businesses at risk on a global scale. Therefore, near real-time mapping of natural hazards is an emerging priority for the support of natural disaster relief, risk management, and informing governmental policy decisions. Recent methods to achieve near real-time mapping increasingly leverage deep learning (DL). However, DL-based approaches are designed for one specific task in a single geographic region based on specific frequency bands of satellite data. Therefore, DL models used to map specific natural hazards struggle with their generalization to other types of natural hazards in unseen regions. In this work, we propose a methodology to significantly improve the generalizability of DL natural hazards mappers based on pre-training on a suitable pre-task. Without access to any data from the target domain, we demonstrate this improved generalizability across four U-Net architectures for the segmentation of unseen natural hazards. Importantly, our method is invariant to geographic differences and differences in the type of frequency bands of satellite data. By leveraging characteristics of unlabeled images from the target domain that are publicly available, our approach is able to further improve the generalization behavior without fine-tuning. Thereby, our approach supports the development of foundation models for earth monitoring with the objective of directly segmenting unseen natural hazards across novel geographic regions given different sources of satellite imagery.
AIOct 3, 2023
Towards Effective Human-AI Decision-Making: The Role of Human Learning in Appropriate Reliance on AI AdviceMax Schemmer, Andrea Bartos, Philipp Spitzer et al.
The true potential of human-AI collaboration lies in exploiting the complementary capabilities of humans and AI to achieve a joint performance superior to that of the individual AI or human, i.e., to achieve complementary team performance (CTP). To realize this complementarity potential, humans need to exercise discretion in following AI 's advice, i.e., appropriately relying on the AI's advice. While previous work has focused on building a mental model of the AI to assess AI recommendations, recent research has shown that the mental model alone cannot explain appropriate reliance. We hypothesize that, in addition to the mental model, human learning is a key mediator of appropriate reliance and, thus, CTP. In this study, we demonstrate the relationship between learning and appropriate reliance in an experiment with 100 participants. This work provides fundamental concepts for analyzing reliance and derives implications for the effective design of human-AI decision-making.
HCSep 19, 2024
Don't be Fooled: The Misinformation Effect of Explanations in Human-AI CollaborationPhilipp Spitzer, Joshua Holstein, Katelyn Morrison et al.
Across various applications, humans increasingly use black-box artificial intelligence (AI) systems without insight into these systems' reasoning. To counter this opacity, explainable AI (XAI) methods promise enhanced transparency and interpretability. While recent studies have explored how XAI affects human-AI collaboration, few have examined the potential pitfalls caused by incorrect explanations. The implications for humans can be far-reaching but have not been explored extensively. To investigate this, we ran a study (n=160) on AI-assisted decision-making in which humans were supported by XAI. Our findings reveal a misinformation effect when incorrect explanations accompany correct AI advice with implications post-collaboration. This effect causes humans to infer flawed reasoning strategies, hindering task execution and demonstrating impaired procedural knowledge. Additionally, incorrect explanations compromise human-AI team-performance during collaboration. With our work, we contribute to HCI by providing empirical evidence for the negative consequences of incorrect explanations on humans post-collaboration and outlining guidelines for designers of AI.
LGMar 28, 2023
Enabling Inter-organizational Analytics in Business Networks Through Meta Machine LearningRobin Hirt, Niklas Kühl, Dominik Martin et al.
Successful analytics solutions that provide valuable insights often hinge on the connection of various data sources. While it is often feasible to generate larger data pools within organizations, the application of analytics within (inter-organizational) business networks is still severely constrained. As data is distributed across several legal units, potentially even across countries, the fear of disclosing sensitive information as well as the sheer volume of the data that would need to be exchanged are key inhibitors for the creation of effective system-wide solutions -- all while still reaching superior prediction performance. In this work, we propose a meta machine learning method that deals with these obstacles to enable comprehensive analyses within a business network. We follow a design science research approach and evaluate our method with respect to feasibility and performance in an industrial use case. First, we show that it is feasible to perform network-wide analyses that preserve data confidentiality as well as limit data transfer volume. Second, we demonstrate that our method outperforms a conventional isolated analysis and even gets close to a (hypothetical) scenario where all data could be shared within the network. Thus, we provide a fundamental contribution for making business networks more effective, as we remove a key obstacle to tap the huge potential of learning from data that is scattered throughout the network.
HCJul 1, 2022
Training Novices: The Role of Human-AI Collaboration and Knowledge TransferPhilipp Spitzer, Niklas Kühl, Marc Goutier
Across a multitude of work environments, expert knowledge is imperative for humans to conduct tasks with high performance and ensure business success. These humans possess task-specific expert knowledge (TSEK) and hence, represent subject matter experts (SMEs). However, not only demographic changes but also personnel downsizing strategies lead and will continue to lead to departures of SMEs within organizations, which constitutes the challenge of how to retain that expert knowledge and train novices to keep the competitive advantage elicited by that expert knowledge. SMEs training novices is time- and cost-intensive, which intensifies the need for alternatives. Human-AI collaboration (HAIC) poses a way out of this dilemma, facilitating alternatives to preserve expert knowledge and teach it to novices for tasks conducted by SMEs beforehand. In this workshop paper, we (1) propose a framework on how HAIC can be utilized to train novices on particular tasks, (2) illustrate the role of explicit and tacit knowledge in this training process via HAIC, and (3) outline a preliminary experiment design to assess the ability of AI systems in HAIC to act as a trainer to transfer TSEK to novices who do not possess prior TSEK.
LGApr 14, 2023
Learning to Defer with Limited Expert PredictionsPatrick Hemmer, Lukas Thede, Michael Vössing et al.
Recent research suggests that combining AI models with a human expert can exceed the performance of either alone. The combination of their capabilities is often realized by learning to defer algorithms that enable the AI to learn to decide whether to make a prediction for a particular instance or defer it to the human expert. However, to accurately learn which instances should be deferred to the human expert, a large number of expert predictions that accurately reflect the expert's capabilities are required -- in addition to the ground truth labels needed to train the AI. This requirement shared by many learning to defer algorithms hinders their adoption in scenarios where the responsible expert regularly changes or where acquiring a sufficient number of expert predictions is costly. In this paper, we propose a three-step approach to reduce the number of expert predictions required to train learning to defer algorithms. It encompasses (1) the training of an embedding model with ground truth labels to generate feature representations that serve as a basis for (2) the training of an expertise predictor model to approximate the expert's capabilities. (3) The expertise predictor generates artificial expert predictions for instances not yet labeled by the expert, which are required by the learning to defer algorithms. We evaluate our approach on two public datasets. One with "synthetically" generated human experts and another from the medical domain containing real-world radiologists' predictions. Our experiments show that the approach allows the training of various learning to defer algorithms with a minimal number of human expert predictions. Furthermore, we demonstrate that even a small number of expert predictions per class is sufficient for these algorithms to exceed the performance the AI and the human expert can achieve individually.
HCApr 19, 2022
Factors that influence the adoption of human-AI collaboration in clinical decision-makingPatrick Hemmer, Max Schemmer, Lara Riefle et al.
Recent developments in Artificial Intelligence (AI) have fueled the emergence of human-AI collaboration, a setting where AI is a coequal partner. Especially in clinical decision-making, it has the potential to improve treatment quality by assisting overworked medical professionals. Even though research has started to investigate the utilization of AI for clinical decision-making, its potential benefits do not imply its adoption by medical professionals. While several studies have started to analyze adoption criteria from a technical perspective, research providing a human-centered perspective with a focus on AI's potential for becoming a coequal team member in the decision-making process remains limited. Therefore, in this work, we identify factors for the adoption of human-AI collaboration by conducting a series of semi-structured interviews with experts in the healthcare domain. We identify six relevant adoption factors and highlight existing tensions between them and effective human-AI collaboration.
LGFeb 7, 2023
Towards Meaningful Anomaly Detection: The Effect of Counterfactual Explanations on the Investigation of Anomalies in Multivariate Time SeriesMax Schemmer, Joshua Holstein, Niklas Bauer et al.
Detecting rare events is essential in various fields, e.g., in cyber security or maintenance. Often, human experts are supported by anomaly detection systems as continuously monitoring the data is an error-prone and tedious task. However, among the anomalies detected may be events that are rare, e.g., a planned shutdown of a machine, but are not the actual event of interest, e.g., breakdowns of a machine. Therefore, human experts are needed to validate whether the detected anomalies are relevant. We propose to support this anomaly investigation by providing explanations of anomaly detection. Related work only focuses on the technical implementation of explainable anomaly detection and neglects the subsequent human anomaly investigation. To address this research gap, we conduct a behavioral experiment using records of taxi rides in New York City as a testbed. Participants are asked to differentiate extreme weather events from other anomalous events such as holidays or sporting events. Our results show that providing counterfactual explanations do improve the investigation of anomalies, indicating potential for explainable anomaly detection in general.
AIOct 15, 2023
A Critical Survey on Fairness Benefits of Explainable AILuca Deck, Jakob Schoeffer, Maria De-Arteaga et al.
In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 scientific articles on the alleged fairness benefits of XAI. We present crucial caveats with respect to these claims and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata. Importantly, we notice that claims are often (i) vague and simplistic, (ii) lacking normative grounding, or (iii) poorly aligned with the actual capabilities of XAI. We suggest to conceive XAI not as an ethical panacea but as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness. Moreover, when making a claim about XAI and fairness, we emphasize the need to be more specific about what kind of XAI method is used, which fairness desideratum it refers to, how exactly it enables fairness, and who is the stakeholder that benefits from XAI.
CVSep 22, 2022
Deep Domain Adaptation for Detecting Bomb Craters in Aerial ImagesMarco Geiger, Dominik Martin, Niklas Kühl
The aftermath of air raids can still be seen for decades after the devastating events. Unexploded ordnance (UXO) is an immense danger to human life and the environment. Through the assessment of wartime images, experts can infer the occurrence of a dud. The current manual analysis process is expensive and time-consuming, thus automated detection of bomb craters by using deep learning is a promising way to improve the UXO disposal process. However, these methods require a large amount of manually labeled training data. This work leverages domain adaptation with moon surface images to address the problem of automated bomb crater detection with deep learning under the constraint of limited training data. This paper contributes to both academia and practice (1) by providing a solution approach for automated bomb crater detection with limited training data and (2) by demonstrating the usability and associated challenges of using synthetic images for domain adaptation.
HCJul 22, 2024
A Survey of AI RelianceSven Eckhardt, Niklas Kühl, Mateusz Dolata et al.
Although artificial intelligence (AI) systems are becoming increasingly indispensable, research into how humans rely on these systems (AI reliance) is lagging behind. To advance this research, this survey presents a novel, comprehensive sociotechnical perspective on AI reliance, essential to fully understand the phenomenon. To address these challenges, the survey introduces a categorization framework resulting in a morphological box, which guides rigorous AI reliance research. Further, the survey identifies the core influences on AI reliance within the components of a sociotechnical system and discusses current limitations alongside emerging future research avenues to form a research agenda.
AIFeb 3, 2023
Towards Avoiding the Data Mess: Industry Insights from Data Mesh ImplementationsJan Bode, Niklas Kühl, Dominik Kreuzberger et al.
With the increasing importance of data and artificial intelligence, organizations strive to become more data-driven. However, current data architectures are not necessarily designed to keep up with the scale and scope of data and analytics use cases. In fact, existing architectures often fail to deliver the promised value associated with them. Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management. As the concept of data mesh is still novel, it lacks empirical insights from the field. Specifically, an understanding of the motivational factors for introducing data mesh, the associated challenges, implementation strategies, its business impact, and potential archetypes is missing. To address this gap, we conduct 15 semi-structured interviews with industry experts. Our results show, among other insights, that organizations have difficulties with the transition toward federated governance associated with the data mesh concept, the shift of responsibility for the development, provision, and maintenance of data products, and the comprehension of the overall concept. In our work, we derive multiple implementation strategies and suggest organizations introduce a cross-domain steering unit, observe the data product usage, create quick wins in the early phases, and favor small dedicated teams that prioritize data products. While we acknowledge that organizations need to apply implementation strategies according to their individual needs, we also deduct two archetypes that provide suggestions in more detail. Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
CVNov 16, 2023
Redefining the Laparoscopic Spatial Sense: AI-based Intra- and Postoperative Measurement from StereoimagesLeopold Müller, Patrick Hemmer, Moritz Queisner et al.
A significant challenge in image-guided surgery is the accurate measurement task of relevant structures such as vessel segments, resection margins, or bowel lengths. While this task is an essential component of many surgeries, it involves substantial human effort and is prone to inaccuracies. In this paper, we develop a novel human-AI-based method for laparoscopic measurements utilizing stereo vision that has been guided by practicing surgeons. Based on a holistic qualitative requirements analysis, this work proposes a comprehensive measurement method, which comprises state-of-the-art machine learning architectures, such as RAFT-Stereo and YOLOv8. The developed method is assessed in various realistic experimental evaluation environments. Our results outline the potential of our method achieving high accuracies in distance measurements with errors below 1 mm. Furthermore, on-surface measurements demonstrate robustness when applied in challenging environments with textureless regions. Overall, by addressing the inherent challenges of image-guided surgery, we lay the foundation for a more robust and accurate solution for intra- and postoperative measurements, enabling more precise, safe, and efficient surgical procedures.
HCApr 19, 2023
On the Perception of Difficulty: Differences between Humans and AIPhilipp Spitzer, Joshua Holstein, Michael Vössing et al.
With the increased adoption of artificial intelligence (AI) in industry and society, effective human-AI interaction systems are becoming increasingly important. A central challenge in the interaction of humans with AI is the estimation of difficulty for human and AI agents for single task instances.These estimations are crucial to evaluate each agent's capabilities and, thus, required to facilitate effective collaboration. So far, research in the field of human-AI interaction estimates the perceived difficulty of humans and AI independently from each other. However, the effective interaction of human and AI agents depends on metrics that accurately reflect each agent's perceived difficulty in achieving valuable outcomes. Research to date has not yet adequately examined the differences in the perceived difficulty of humans and AI. Thus, this work reviews recent research on the perceived difficulty in human-AI interaction and contributing factors to consistently compare each agent's perceived difficulty, e.g., creating the same prerequisites. Furthermore, we present an experimental design to thoroughly examine the perceived difficulty of both agents and contribute to a better understanding of the design of such systems.
LGAug 16, 2024
A Multivocal Literature Review on Privacy and Fairness in Federated LearningBeatrice Balbierer, Lukas Heinlein, Domenique Zipperling et al.
Federated Learning presents a way to revolutionize AI applications by eliminating the necessity for data sharing. Yet, research has shown that information can still be extracted during training, making additional privacy-preserving measures such as differential privacy imperative. To implement real-world federated learning applications, fairness, ranging from a fair distribution of performance to non-discriminative behaviour, must be considered. Particularly in high-risk applications (e.g. healthcare), avoiding the repetition of past discriminatory errors is paramount. As recent research has demonstrated an inherent tension between privacy and fairness, we conduct a multivocal literature review to examine the current methods to integrate privacy and fairness in federated learning. Our analyses illustrate that the relationship between privacy and fairness has been neglected, posing a critical risk for real-world applications. We highlight the need to explore the relationship between privacy, fairness, and performance, advocating for the creation of integrated federated learning frameworks.
CVMar 24, 2025Code
Towards Human-Understandable Multi-Dimensional Concept DiscoveryArne Grobrügge, Niklas Kühl, Gerhard Satzger et al.
Concept-based eXplainable AI (C-XAI) aims to overcome the limitations of traditional saliency maps by converting pixels into human-understandable concepts that are consistent across an entire dataset. A crucial aspect of C-XAI is completeness, which measures how well a set of concepts explains a model's decisions. Among C-XAI methods, Multi-Dimensional Concept Discovery (MCD) effectively improves completeness by breaking down the CNN latent space into distinct and interpretable concept subspaces. However, MCD's explanations can be difficult for humans to understand, raising concerns about their practical utility. To address this, we propose Human-Understandable Multi-dimensional Concept Discovery (HU-MCD). HU-MCD uses the Segment Anything Model for concept identification and implements a CNN-specific input masking technique to reduce noise introduced by traditional masking methods. These changes to MCD, paired with the completeness relation, enable HU-MCD to enhance concept understandability while maintaining explanation faithfulness. Our experiments, including human subject studies, show that HU-MCD provides more precise and reliable explanations than existing C-XAI methods. The code is available at https://github.com/grobruegge/hu-mcd.
90.6HCApr 15
Smart But Not Moral? Moral Alignment In Human-AI Decision-MakingChristiane Ernst, Luis Gutmann, Domenique Zipperling et al.
In high-stakes AI-supported decisions, considerations are not purely technical but involve moral judgments about fairness, responsibility, and harm. While prior research has focused mainly on functional or behavioral alignment, this paper argues that moral alignment may be a more fundamental dimension of human-AI decision-making. Moral alignment is defined as the perceived congruence between the values embedded in an AI system's decision logic and the moral intuitions of stakeholders. Building on Moral Foundations Theory, the paper adopts a multi-stakeholder perspective and highlights why moral (mis)alignment matters for the meaningful integration of AI in sensitive contexts.
49.2AIMar 12
Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-agent AILuca Deck, Simeon Allmendinger, Lucas Müller et al.
In the late 2010s, the fashion trend NormCore framed sameness as a signal of belonging, illustrating how norms emerge through collective coordination. Today, similar forms of normative coordination can be observed in systems based on Multi-agent Artificial Intelligence (MAAI), as AI-based agents deliberate, negotiate, and converge on shared decisions in fairness-sensitive domains. Yet, existing empirical approaches often treat norms as targets for alignment or replication, implicitly assuming equivalence between human subjects and AI agents and leaving collective normative dynamics insufficiently examined. To address this gap, we propose Normative Common Ground Replication (NormCoRe), a novel methodological framework to systematically translate the design of human subject experiments into MAAI environments. Building on behavioral science, replication research, and state-of-the-art MAAI architectures, NormCoRe maps the structural layers of human subject studies onto the design of AI agent studies, enabling systematic documentation of study design and analysis of norms in MAAI. We demonstrate the utility of NormCoRe by replicating a seminal experimental study on distributive justice, in which participants negotiate fairness principles under a "veil of ignorance". We show that normative judgments in AI agent studies can differ from human baselines and are sensitive to the choice of the foundation model and the language used to instantiate agent personas. Our work provides a principled pathway for analyzing norms in MAAI and helps to guide, reflect, and document design choices whenever AI agents are used to automate or support tasks formerly carried out by humans.
DCMay 29, 2020Code
AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless EnvironmentsLucia Schuler, Somaya Jamil, Niklas Kühl
Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider side, flexible resource management is required to meet fluctuating demand. It can be enabled through automated provisioning and deprovisioning of resources. A common approach among both commercial and open source serverless computing platforms is workload-based auto-scaling, where a designated algorithm scales instances according to the number of incoming requests. In the recently evolving serverless framework Knative a request-based policy is proposed, where the algorithm scales resources by a configured maximum number of requests that can be processed in parallel per instance, the so-called concurrency. As we show in a baseline experiment, this predefined concurrency level can strongly influence the performance of a serverless application. However, identifying the concurrency configuration that yields the highest possible quality of service is a challenging task due to various factors, e.g. varying workload and complex infrastructure characteristics, influencing throughput and latency. While there has been considerable research into intelligent techniques for optimizing auto-scaling for virtual machine provisioning, this topic has not yet been discussed in the area of serverless computing. For this reason, we investigate the applicability of a reinforcement learning approach, which has been proven on dynamic virtual machine provisioning, to request-based auto-scaling in a serverless framework. Our results show that within a limited number of iterations our proposed model learns an effective scaling policy per workload, improving the performance compared to the default auto-scaling configuration.
HCMar 21, 2024
Complementarity in Human-AI Collaboration: Concept, Sources, and EvidencePatrick Hemmer, Max Schemmer, Niklas Kühl et al.
Artificial intelligence (AI) has the potential to significantly enhance human performance across various domains. Ideally, collaboration between humans and AI should result in complementary team performance (CTP) -- a level of performance that neither of them can attain individually. So far, however, CTP has rarely been observed, suggesting an insufficient understanding of the principle and the application of complementarity. Therefore, we develop a general concept of complementarity and formalize its theoretical potential as well as the actual realized effect in decision-making situations. Moreover, we identify information and capability asymmetry as the two key sources of complementarity. Finally, we illustrate the impact of each source on complementarity potential and effect in two empirical studies. Our work provides researchers with a comprehensive theoretical foundation of human-AI complementarity in decision-making and demonstrates that leveraging these sources constitutes a viable pathway towards designing effective human-AI collaboration, i.e., the realization of CTP.
IVApr 23, 2024
Interactive Generation of Laparoscopic Videos with Diffusion ModelsIvan Iliash, Simeon Allmendinger, Felix Meissen et al.
Generative AI, in general, and synthetic visual data generation, in specific, hold much promise for benefiting surgical training by providing photorealism to simulation environments. Current training methods primarily rely on reading materials and observing live surgeries, which can be time-consuming and impractical. In this work, we take a significant step towards improving the training process. Specifically, we use diffusion models in combination with a zero-shot video diffusion method to interactively generate realistic laparoscopic images and videos by specifying a surgical action through text and guiding the generation with tool positions through segmentation masks. We demonstrate the performance of our approach using the publicly available Cholec dataset family and evaluate the fidelity and factual correctness of our generated images using a surgical action recognition model as well as the pixel-wise F1-score for the spatial control of tool generation. We achieve an FID of 38.097 and an F1-score of 0.71.
HCJan 9, 2024
Human Delegation Behavior in Human-AI Collaboration: The Effect of Contextual InformationPhilipp Spitzer, Joshua Holstein, Patrick Hemmer et al.
The integration of artificial intelligence (AI) into human decision-making processes at the workplace presents both opportunities and challenges. One promising approach to leverage existing complementary capabilities is allowing humans to delegate individual instances of decision tasks to AI. However, enabling humans to delegate instances effectively requires them to assess several factors. One key factor is the analysis of both their own capabilities and those of the AI in the context of the given task. In this work, we conduct a behavioral study to explore the effects of providing contextual information to support this delegation decision. Specifically, we investigate how contextual information about the AI and the task domain influence humans' delegation decisions to an AI and their impact on the human-AI team performance. Our findings reveal that access to contextual information significantly improves human-AI team performance in delegation settings. Finally, we show that the delegation behavior changes with the different types of contextual information. Overall, this research advances the understanding of computer-supported, collaborative work and provides actionable insights for designing more effective collaborative systems.
IVDec 5, 2023
Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image GenerationSimeon Allmendinger, Patrick Hemmer, Moritz Queisner et al.
Recent advances in synthetic imaging open up opportunities for obtaining additional data in the field of surgical imaging. This data can provide reliable supplements supporting surgical applications and decision-making through computer vision. Particularly the field of image-guided surgery, such as laparoscopic and robotic-assisted surgery, benefits strongly from synthetic image datasets and virtual surgical training methods. Our study presents an intuitive approach for generating synthetic laparoscopic images from short text prompts using diffusion-based generative models. We demonstrate the usage of state-of-the-art text-to-image architectures in the context of laparoscopic imaging with regard to the surgical removal of the gallbladder as an example. Results on fidelity and diversity demonstrate that diffusion-based models can acquire knowledge about the style and semantics in the field of image-guided surgery. A validation study with a human assessment survey underlines the realistic nature of our synthetic data, as medical personnel detects actual images in a pool with generated images causing a false-positive rate of 66%. In addition, the investigation of a state-of-the-art machine learning model to recognize surgical actions indicates enhanced results when trained with additional generated images of up to 5.20%. Overall, the achieved image quality contributes to the usage of computer-generated images in surgical applications and enhances its path to maturity.
AIMar 29, 2024
Implications of the AI Act for Non-Discrimination Law and Algorithmic FairnessLuca Deck, Jan-Laurin Müller, Conradin Braun et al.
The topic of fairness in AI, as debated in the FATE (Fairness, Accountability, Transparency, and Ethics in AI) communities, has sparked meaningful discussions in the past years. However, from a legal perspective, particularly from the perspective of European Union law, many open questions remain. Whereas algorithmic fairness aims to mitigate structural inequalities at design-level, European non-discrimination law is tailored to individual cases of discrimination after an AI model has been deployed. The AI Act might present a tremendous step towards bridging these two approaches by shifting non-discrimination responsibilities into the design stage of AI models. Based on an integrative reading of the AI Act, we comment on legal as well as technical enforcement problems and propose practical implications on bias detection and bias correction in order to specify and comply with specific technical requirements.
LGSep 13, 2024
Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen DatasetsLars Böcking, Leopold Müller, Niklas Kühl
The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection, and hyperparameter optimizations are effective but require considerable computational resources and necessitate access to all data points to run their optimizations. In this work, we introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner and provides insight into the algorithm selection problem without requiring training on the (unseen) dataset. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty in a scalable and adaptable manner. Our approach is evaluated on the 112 University of California riverside benchmark datasets, demonstrating its effectiveness in predicting the performance of 35 state-of-the-art algorithms and providing valuable insights for effective algorithm selection in time series classification service systems, improving a naive baseline by 7.32% on average in estimating the mean performance and 15.81% in estimating the uncertainty.
LGApr 29, 2024
Mapping the Potential of Explainable AI for Fairness Along the AI LifecycleLuca Deck, Astrid Schomäcker, Timo Speith et al.
The widespread use of artificial intelligence (AI) systems across various domains is increasingly surfacing issues related to algorithmic fairness, especially in high-stakes scenarios. Thus, critical considerations of how fairness in AI systems might be improved -- and what measures are available to aid this process -- are overdue. Many researchers and policymakers see explainable AI (XAI) as a promising way to increase fairness in AI systems. However, there is a wide variety of XAI methods and fairness conceptions expressing different desiderata, and the precise connections between XAI and fairness remain largely nebulous. Besides, different measures to increase algorithmic fairness might be applicable at different points throughout an AI system's lifecycle. Yet, there currently is no coherent mapping of fairness desiderata along the AI lifecycle. In this paper, we we distill eight fairness desiderata, map them along the AI lifecycle, and discuss how XAI could help address each of them. We hope to provide orientation for practical applications and to inspire XAI research specifically focused on these fairness desiderata.
CLApr 22, 2025
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and ExplainabilityDaniel Hendriks, Philipp Spitzer, Niklas Kühl et al.
Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a small student model from a larger teacher model. Previous research has introduced several distillation methods for both generating training data and for training the student model. Despite their relevance, the effects of state-of-the-art distillation methods on model performance and explainability have not been thoroughly investigated and compared. In this work, we enlarge the set of available methods by applying critique-revision prompting to distillation for data generation and by synthesizing existing methods for training. For these methods, we provide a systematic comparison based on the widely used Commonsense Question-Answering (CQA) dataset. While we measure performance via student model accuracy, we employ a human-grounded study to evaluate explainability. We contribute new distillation methods and their comparison in terms of both performance and explainability. This should further advance the distillation of small language models and, thus, contribute to broader applicability and faster diffusion of LLM technology.
LGApr 9, 2025
Beware of "Explanations" of AIDavid Martens, Galit Shmueli, Theodoros Evgeniou et al.
Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge. This has led to an expanding field of research in explainable artificial intelligence (XAI), highlighting the potential of explanations to enhance trust, support adoption, and meet regulatory standards. However, the question of what constitutes a "good" explanation is dependent on the goals, stakeholders, and context. At a high level, psychological insights such as the concept of mental model alignment can offer guidance, but success in practice is challenging due to social and technical factors. As a result of this ill-defined nature of the problem, explanations can be of poor quality (e.g. unfaithful, irrelevant, or incoherent), potentially leading to substantial risks. Instead of fostering trust and safety, poorly designed explanations can actually cause harm, including wrong decisions, privacy violations, manipulation, and even reduced AI adoption. Therefore, we caution stakeholders to beware of explanations of AI: while they can be vital, they are not automatically a remedy for transparency or responsible AI adoption, and their misuse or limitations can exacerbate harm. Attention to these caveats can help guide future research to improve the quality and impact of AI explanations.
CYJul 10, 2025
A Multi-Level Strategy for Deepfake Content Moderation under EU RegulationMax-Paul Förster, Luca Deck, Raimund Weidlich et al.
The growing availability and use of deepfake technologies increases risks for democratic societies, e.g., for political communication on online platforms. The EU has responded with transparency obligations for providers and deployers of Artificial Intelligence (AI) systems and online platforms. This includes marking deepfakes during generation and labeling deepfakes when they are shared. However, the lack of industry and enforcement standards poses an ongoing challenge. Through a multivocal literature review, we summarize methods for marking, detecting, and labeling deepfakes and assess their effectiveness under EU regulation. Our results indicate that individual methods fail to meet regulatory and practical requirements. Therefore, we propose a multi-level strategy combining the strengths of existing methods. To account for the masses of content on online platforms, our multi-level strategy provides scalability and practicality via a simple scoring mechanism. At the same time, it is agnostic to types of deepfake technology and allows for context-specific risk weighting.
HCOct 1, 2025
PromptPilot: Improving Human-AI Collaboration Through LLM-Enhanced Prompt EngineeringNiklas Gutheil, Valentin Mayer, Leopold Müller et al.
Effective prompt engineering is critical to realizing the promised productivity gains of large language models (LLMs) in knowledge-intensive tasks. Yet, many users struggle to craft prompts that yield high-quality outputs, limiting the practical benefits of LLMs. Existing approaches, such as prompt handbooks or automated optimization pipelines, either require substantial effort, expert knowledge, or lack interactive guidance. To address this gap, we design and evaluate PromptPilot, an interactive prompting assistant grounded in four empirically derived design objectives for LLM-enhanced prompt engineering. We conducted a randomized controlled experiment with 80 participants completing three realistic, work-related writing tasks. Participants supported by PromptPilot achieved significantly higher performance (median: 78.3 vs. 61.7; p = .045, d = 0.56), and reported enhanced efficiency, ease-of-use, and autonomy during interaction. These findings empirically validate the effectiveness of our proposed design objectives, establishing LLM-enhanced prompt engineering as a viable technique for improving human-AI collaboration.
AIOct 1, 2025
Data Quality Challenges in Retrieval-Augmented GenerationLeopold Müller, Joshua Holstein, Sarah Bause et al.
Organizations increasingly adopt Retrieval-Augmented Generation (RAG) to enhance Large Language Models with enterprise-specific knowledge. However, current data quality (DQ) frameworks have been primarily developed for static datasets, and only inadequately address the dynamic, multi-stage nature of RAG systems. This study aims to develop DQ dimensions for this new type of AI-based systems. We conduct 16 semi-structured interviews with practitioners of leading IT service companies. Through a qualitative content analysis, we inductively derive 15 distinct DQ dimensions across the four processing stages of RAG systems: data extraction, data transformation, prompt & search, and generation. Our findings reveal that (1) new dimensions have to be added to traditional DQ frameworks to also cover RAG contexts; (2) these new dimensions are concentrated in early RAG steps, suggesting the need for front-loaded quality management strategies, and (3) DQ issues transform and propagate through the RAG pipeline, necessitating a dynamic, step-aware approach to quality management.
CVAug 4, 2025
Do Edges Matter? Investigating Edge-Enhanced Pre-Training for Medical Image SegmentationPaul Zaha, Lars Böcking, Simeon Allmendinger et al.
Medical image segmentation is crucial for disease diagnosis and treatment planning, yet developing robust segmentation models often requires substantial computational resources and large datasets. Existing research shows that pre-trained and finetuned foundation models can boost segmentation performance. However, questions remain about how particular image preprocessing steps may influence segmentation performance across different medical imaging modalities. In particular, edges-abrupt transitions in pixel intensity-are widely acknowledged as vital cues for object boundaries but have not been systematically examined in the pre-training of foundation models. We address this gap by investigating to which extend pre-training with data processed using computationally efficient edge kernels, such as kirsch, can improve cross-modality segmentation capabilities of a foundation model. Two versions of a foundation model are first trained on either raw or edge-enhanced data across multiple medical imaging modalities, then finetuned on selected raw subsets tailored to specific medical modalities. After systematic investigation using the medical domains Dermoscopy, Fundus, Mammography, Microscopy, OCT, US, and XRay, we discover both increased and reduced segmentation performance across modalities using edge-focused pre-training, indicating the need for a selective application of this approach. To guide such selective applications, we propose a meta-learning strategy. It uses standard deviation and image entropy of the raw image to choose between a model pre-trained on edge-enhanced or on raw data for optimal performance. Our experiments show that integrating this meta-learning layer yields an overall segmentation performance improvement across diverse medical imaging tasks by 16.42% compared to models pre-trained on edge-enhanced data only and 19.30% compared to models pre-trained on raw data only.
LGFeb 22, 2025
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome PredictionSarah Ball, Simeon Allmendinger, Frauke Kreuter et al.
Generative AI (GenAI) is increasingly used in survey contexts to simulate human preferences. While many research endeavors evaluate the quality of synthetic GenAI data by comparing model-generated responses to gold-standard survey results, fundamental questions about the validity and reliability of using LLMs as substitutes for human respondents remain. Our study provides a technical analysis of how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs) and evaluates their suitability for survey-based predictions. Using 14 different models, we find that LLM-generated data fails to replicate the variance observed in real-world human responses, particularly across demographic subgroups. In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data. Moreover, we show that prompt sensitivity can significantly alter outputs for some models, further undermining the stability and predictiveness of LLM-based simulations. As a key contribution, we adapt a probe-based methodology that reveals how LLMs encode political affiliations in their latent space, exposing the systematic distortions introduced by these models. Our findings highlight critical limitations in AI-generated survey data, urging caution in its use for public opinion research, social science experimentation, and computational behavioral modeling.
LGJan 8, 2025
Towards a Problem-Oriented Domain Adaptation Framework for Machine LearningPhilipp Spitzer, Dominik Martin, Laurin Eichberger et al.
Domain adaptation is a sub-field of machine learning that involves transferring knowledge from a source domain to perform the same task in the target domain. It is a typical challenge in machine learning that arises, e.g., when data is obtained from various sources or when using a data basis that changes over time. Recent advances in the field offer promising methods, but it is still challenging for researchers and practitioners to determine if domain adaptation is suitable for a given problem -- and, subsequently, to select the appropriate approach. This article employs design science research to develop a problem-oriented framework for domain adaptation, which is matured in three evaluation episodes. We describe a framework that distinguishes between five domain adaptation scenarios, provides recommendations for addressing each scenario, and offers guidelines for determining if a problem falls into one of these scenarios. During the multiple evaluation episodes, the framework is tested on artificial and real-world datasets and an experimental study involving 100 participants. The evaluation demonstrates that the framework has the explanatory power to capture any domain adaptation problem effectively. In summary, we provide clear guidance for researchers and practitioners who want to employ domain adaptation but lack in-depth knowledge of the possibilities.
LGJun 20, 2024
CollaFuse: Collaborative Diffusion ModelsSimeon Allmendinger, Domenique Zipperling, Lukas Struppek et al.
In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, particularly concerning data availability, computational requirements, and privacy. Traditional approaches to address these shortcomings, like federated learning, often impose significant computational burdens on individual clients, especially those with constrained resources. In response to these challenges, we introduce a novel approach for distributed collaborative diffusion models inspired by split learning. Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis. This reduced computational burden is achieved by retaining data and computationally inexpensive processes locally at each client while outsourcing the computationally expensive processes to shared, more efficient server resources. Through experiments on the common CelebA dataset, our approach demonstrates enhanced privacy by reducing the necessity for sharing raw data. These capabilities hold significant potential across various application areas, including the design of edge computing solutions. Thus, our work advances distributed machine learning by contributing to the evolution of collaborative diffusion models.
AIJun 18, 2024
Investigating the Role of Explainability and AI Literacy in User ComplianceNiklas Kühl, Christian Meske, Maximilian Nitsche et al.
AI is becoming increasingly common across different domains. However, as sophisticated AI-based systems are often black-boxed, rendering the decision-making logic opaque, users find it challenging to comply with their recommendations. Although researchers are investigating Explainable AI (XAI) to increase the transparency of the underlying machine learning models, it is unclear what types of explanations are effective and what other factors increase compliance. To better understand the interplay of these factors, we conducted an experiment with 562 participants who were presented with the recommendations of an AI and two different types of XAI. We find that users' compliance increases with the introduction of XAI but is also affected by AI literacy. We also find that the relationships between AI literacy XAI and users' compliance are mediated by the users' mental model of AI. Our study has several implications for successfully designing AI-based systems utilizing XAI.
HCJun 3, 2024
Transferring Domain Knowledge with (X)AI-Based Learning SystemsPhilipp Spitzer, Niklas Kühl, Marc Goutier et al.
In numerous high-stakes domains, training novices via conventional learning systems does not suffice. To impart tacit knowledge, experts' hands-on guidance is imperative. However, training novices by experts is costly and time-consuming, increasing the need for alternatives. Explainable artificial intelligence (XAI) has conventionally been used to make black-box artificial intelligence systems interpretable. In this work, we utilize XAI as an alternative: An (X)AI system is trained on experts' past decisions and is then employed to teach novices by providing examples coupled with explanations. In a study with 249 participants, we measure the effectiveness of such an approach for a classification task. We show that (X)AI-based learning systems are able to induce learning in novices and that their cognitive styles moderate learning. Thus, we take the first steps to reveal the impact of XAI on human learning and point AI developers to future options to tailor the design of (X)AI-based learning systems.
LGFeb 29, 2024
CollaFuse: Navigating Limited Resources and Privacy in Collaborative Generative AIDomenique Zipperling, Simeon Allmendinger, Lukas Struppek et al.
In the landscape of generative artificial intelligence, diffusion-based models present challenges for socio-technical systems in data requirements and privacy. Traditional approaches like federated learning distribute the learning process but strain individual clients, especially with constrained resources (e.g., edge devices). In response to these challenges, we introduce CollaFuse, a novel framework inspired by split learning. Tailored for efficient and collaborative use of denoising diffusion probabilistic models, CollaFuse enables shared server training and inference, alleviating client computational burdens. This is achieved by retaining data and computationally inexpensive GPU processes locally at each client while outsourcing the computationally expensive processes to the shared server. Demonstrated in a healthcare context, CollaFuse enhances privacy by highly reducing the need for sensitive information sharing. These capabilities hold the potential to impact various application areas, such as the design of edge computing solutions, healthcare research, or autonomous driving. In essence, our work advances distributed machine learning, shaping the future of collaborative GenAI networks.
CVSep 3, 2023
AB2CD: AI for Building Climate Damage Classification and DetectionMaximilian Nitsche, S. Karthik Mukkavilli, Niklas Kühl et al.
We explore the implementation of deep learning techniques for precise building damage assessment in the context of natural hazards, utilizing remote sensing data. The xBD dataset, comprising diverse disaster events from across the globe, serves as the primary focus, facilitating the evaluation of deep learning models. We tackle the challenges of generalization to novel disasters and regions while accounting for the influence of low-quality and noisy labels inherent in natural hazard data. Furthermore, our investigation quantitatively establishes that the minimum satellite imagery resolution essential for effective building damage detection is 3 meters and below 1 meter for classification using symmetric and asymmetric resolution perturbation analyses. To achieve robust and accurate evaluations of building damage detection and classification, we evaluated different deep learning models with residual, squeeze and excitation, and dual path network backbones, as well as ensemble techniques. Overall, the U-Net Siamese network ensemble with F-1 score of 0.812 performed the best against the xView2 challenge benchmark. Additionally, we evaluate a Universal model trained on all hazards against a flood expert model and investigate generalization gaps across events, and out of distribution from field data in the Ahr Valley. Our research findings showcase the potential and limitations of advanced AI solutions in enhancing the impact assessment of climate change-induced extreme weather events, such as floods and hurricanes. These insights have implications for disaster impact assessment in the face of escalating climate challenges.
HCMay 12, 2023
ML-Based Teaching Systems: A Conceptual FrameworkPhilipp Spitzer, Niklas Kühl, Daniel Heinz et al.
As the shortage of skilled workers continues to be a pressing issue, exacerbated by demographic change, it is becoming a critical challenge for organizations to preserve the knowledge of retiring experts and to pass it on to novices. While this knowledge transfer has traditionally taken place through personal interaction, it lacks scalability and requires significant resources and time. IT-based teaching systems have addressed this scalability issue, but their development is still tedious and time-consuming. In this work, we investigate the potential of machine learning (ML) models to facilitate knowledge transfer in an organizational context, leading to more cost-effective IT-based teaching systems. Through a systematic literature review, we examine key concepts, themes, and dimensions to better understand and design ML-based teaching systems. To do so, we capture and consolidate the capabilities of ML models in IT-based teaching systems, inductively analyze relevant concepts in this context, and determine their interrelationships. We present our findings in the form of a review of the key concepts, themes, and dimensions to understand and inform on ML-based teaching systems. Building on these results, our work contributes to research on computer-supported cooperative work by conceptualizing how ML-based teaching systems can preserve expert knowledge and facilitate its transfer from SMEs to human novices. In this way, we shed light on this emerging subfield of human-computer interaction and serve to build an interdisciplinary research agenda.
LGOct 18, 2021
Utilizing Active Machine Learning for Quality Assurance: A Case Study of Virtual Car Renderings in the Automotive IndustryPatrick Hemmer, Niklas Kühl, Jakob Schöffer
Computer-generated imagery of car models has become an indispensable part of car manufacturers' advertisement concepts. They are for instance used in car configurators to offer customers the possibility to configure their car online according to their personal preferences. However, human-led quality assurance faces the challenge to keep up with high-volume visual inspections due to the car models' increasing complexity. Even though the application of machine learning to many visual inspection tasks has demonstrated great success, its need for large labeled data sets remains a central barrier to using such systems in practice. In this paper, we propose an active machine learning-based quality assurance system that requires significantly fewer labeled instances to identify defective virtual car renderings without compromising performance. By employing our system at a German automotive manufacturer, start-up difficulties can be overcome, the inspection process efficiency can be increased, and thus economic advantages can be realized.