ROJul 23, 2022
Robots Enact Malignant StereotypesAndrew Hundt, William Agnew, Vicky Zeng et al. · cmu
Stereotypes, bias, and discrimination have been extensively documented in Machine Learning (ML) methods such as Computer Vision (CV) [18, 80], Natural Language Processing (NLP) [6], or both, in the case of large image and caption models such as OpenAI CLIP [14]. In this paper, we evaluate how ML bias manifests in robots that physically and autonomously act within the world. We audit one of several recently published CLIP-powered robotic manipulation methods, presenting it with objects that have pictures of human faces on the surface which vary across race and gender, alongside task descriptions that contain terms associated with common stereotypes. Our experiments definitively show robots acting out toxic stereotypes with respect to gender, race, and scientifically-discredited physiognomy, at scale. Furthermore, the audited methods are less likely to recognize Women and People of Color. Our interdisciplinary sociotechnical analysis synthesizes across fields and applications such as Science Technology and Society (STS), Critical Studies, History, Safety, Robotics, and AI. We find that robots powered by large datasets and Dissolution Models (sometimes called "foundation models", e.g. CLIP) that contain humans risk physically amplifying malignant stereotypes in general; and that merely correcting disparities will be insufficient for the complexity and scale of the problem. Instead, we recommend that robot learning methods that physically manifest stereotypes or other harmful outcomes be paused, reworked, or even wound down when appropriate, until outcomes can be proven safe, effective, and just. Finally, we discuss comprehensive policy changes and the potential of new interdisciplinary research on topics like Identity Safety Assessment Frameworks and Design Justice to better understand and address these harms.
SEAug 12, 2024
Targeted Deep Learning System Boundary TestingOliver Weißl, Amr Abdellatif, Xingcheng Chen et al.
Evaluating the behavioral boundaries of deep learning (DL) systems is crucial for understanding their reliability across diverse, unseen inputs. Existing solutions fall short as they rely on untargeted random, model- or latent-based perturbations, due to difficulties in generating controlled input variations. In this work, we introduce Mimicry, a novel black-box test generator for fine-grained, targeted exploration of DL system boundaries. Mimicry performs boundary testing by leveraging the probabilistic nature of DL outputs to identify promising directions for exploration. It uses style-based GANs to disentangle input representations into content and style components, enabling controlled feature mixing to approximate the decision boundary. We evaluated Mimicry's effectiveness in generating boundary inputs for five widely used DL image classification systems of increasing complexity, comparing it to two baseline approaches. Our results show that Mimicry consistently identifies inputs closer to the decision boundary. It generates semantically meaningful boundary test cases that reveal new functional (mis)behaviors, while the baselines produce mainly corrupted or invalid inputs. Thanks to its enhanced control over latent space manipulations, Mimicry remains effective as dataset complexity increases, maintaining competitive diversity and higher validity rates, confirmed by human assessors.
LGJan 21
HyperNet-Adaptation for Diffusion-Based Test Case GenerationOliver Weißl, Vincenzo Riccio, Severin Kacianka et al.
The increasing deployment of deep learning systems requires systematic evaluation of their reliability in real-world scenarios. Traditional gradient-based adversarial attacks introduce small perturbations that rarely correspond to realistic failures and mainly assess robustness rather than functional behavior. Generative test generation methods offer an alternative but are often limited to simple datasets or constrained input domains. Although diffusion models enable high-fidelity image synthesis, their computational cost and limited controllability restrict their applicability to large-scale testing. We present HyNeA, a generative testing method that enables direct and efficient control over diffusion-based generation. HyNeA provides dataset-free controllability through hypernetworks, allowing targeted manipulation of the generative process without relying on architecture-specific conditioning mechanisms or dataset-driven adaptations such as fine-tuning. HyNeA employs a distinct training strategy that supports instance-level tuning to identify failure-inducing test cases without requiring datasets that explicitly contain examples of similar failures. This approach enables the targeted generation of realistic failure cases at substantially lower computational cost than search-based methods. Experimental results show that HyNeA improves controllability and test diversity compared to existing generative test generators and generalizes to domains where failure-labeled training data is unavailable.
AIFeb 13, 2024
Towards Equitable Agile Research and Development of AI and RoboticsAndrew Hundt, Julia Schuller, Severin Kacianka · cmu
Machine Learning (ML) and 'Artificial Intelligence' ('AI') methods tend to replicate and amplify existing biases and prejudices, as do Robots with AI. For example, robots with facial recognition have failed to identify Black Women as human, while others have categorized people, such as Black Men, as criminals based on appearance alone. A 'culture of modularity' means harms are perceived as 'out of scope', or someone else's responsibility, throughout employment positions in the 'AI supply chain'. Incidents are routine enough (incidentdatabase.ai lists over 2000 examples) to indicate that few organizations are capable of completely respecting peoples' rights; meeting claimed equity, diversity, and inclusion (EDI or DEI) goals; or recognizing and then addressing such failures in their organizations and artifacts. We propose a framework for adapting widely practiced Research and Development (R&D) project management methodologies to build organizational equity capabilities and better integrate known evidence-based best practices. We describe how project teams can organize and operationalize the most promising practices, skill sets, organizational cultures, and methods to detect and address rights-based fairness, equity, accountability, and ethical problems as early as possible when they are often less harmful and easier to mitigate; then monitor for unforeseen incidents to adaptively and constructively address them. Our primary example adapts an Agile development process based on Scrum, one of the most widely adopted approaches to organizing R&D teams. We also discuss limitations of our proposed framework and future research directions.
SEJul 15, 2021
Empowered and Embedded: Ethics and Agile ProcessesNiina Zuber, Severin Kacianka, Jan Gogoll et al.
In this article we focus on the structural aspects of the development of ethical software, and argue that ethical considerations need to be embedded into the (agile) software development process. In fact, we claim that agile processes of software development lend themselves specifically well for this endeavour. First, we contend that ethical evaluations need to go beyond the use of software products and include an evaluation of the software itself. This implies that software engineers influence peoples' lives through the features of their designed products. Embedded values are thus approached best by software engineers themselves. Therefore, we put emphasis on the possibility to implement ethical deliberations in already existing and well established agile software development processes. Our approach relies on software engineers making their own judgments throughout the entire development process to ensure that technical features and ethical evaluation can be addressed adequately to transport and foster desirable values and norms. We argue that agile software development processes may help the implementation of ethical deliberation for five reasons: 1) agile methods are widely spread, 2) their emphasis on flat hierarchies promotes independent thinking, 3) their reliance on existing team structures serve as an incubator for deliberation, 4) agile development enhances object-focused techno-ethical realism, and, finally, 5) agile structures provide a salient endpoint to deliberation.
SEJan 20, 2021
Designing Accountable SystemsSeverin Kacianka, Alexander Pretschner
Accountability is an often called for property of technical systems. It is a requirement for algorithmic decision systems, autonomous cyber-physical systems, and for software systems in general. As a concept, accountability goes back to the early history of Liberalism and is suggested as a tool to limit the use of power. This long history has also given us many, often slightly differing, definitions of accountability. The problem that software developers now face is to understand what accountability means for their systems and how to reflect it in a system's design. To enable the rigorous study of accountability in a system, we need models that are suitable for capturing such a varied concept. In this paper, we present a method to express and compare different definitions of accountability using Structural Causal Models. We show how these models can be used to evaluate a system's design and present a small use case based on an autonomous car.
SENov 5, 2020
Ethics in the Software Development Process: From Codes of Conduct to Ethical DeliberationJan Gogoll, Niina Zuber, Severin Kacianka et al.
Software systems play an ever more important role in our lives and software engineers and their companies find themselves in a position where they are held responsible for ethical issues that may arise. In this paper, we try to disentangle ethical considerations that can be performed at the level of the software engineer from those that belong in the wider domain of business ethics. The handling of ethical problems that fall into the responsibility of the engineer have traditionally been addressed by the publication of Codes of Ethics and Conduct. We argue that these Codes are barely able to provide normative orientation in software development. The main contribution of this paper is, thus, to analyze the normative features of Codes of Ethics in software engineering and to explicate how their value-based approach might prevent their usefulness from a normative perspective. Codes of Conduct cannot replace ethical deliberation because they do not and cannot offer guidance because of their underdetermined nature. This lack of orientation, we argue, triggers reactive behavior such as "cherry-picking", "risk of indifference", "ex-post orientation" and the "desire to rely on gut feeling". In the light of this, we propose to implement ethical deliberation within software development teams as a way out.
SEMay 7, 2020
Expressing Accountability Patterns using Structural Causal ModelsSeverin Kacianka, Amjad Ibrahim, Alexander Pretschner
While the exact definition and implementation of accountability depend on the specific context, at its core accountability describes a mechanism that will make decisions transparent and often provides means to sanction "bad" decisions. As such, accountability is specifically relevant for Cyber-Physical Systems, such as robots or drones, that embed themselves into a human society, take decisions and might cause lasting harm. Without a notion of accountability, such systems could behave with impunity and would not fit into society. Despite its relevance, there is currently no agreement on its meaning and, more importantly, no way to express accountability properties for these systems. As a solution we propose to express the accountability properties of systems using Structural Causal Models. They can be represented as human-readable graphical models while also offering mathematical tools to analyze and reason over them. Our central contribution is to show how Structural Causal Models can be used to express and analyze the accountability properties of systems and that this approach allows us to identify accountability patterns. These accountability patterns can be catalogued and used to improve systems and their architectures.
AIOct 31, 2019
Extending Causal Models from Machines into HumansSeverin Kacianka, Amjad Ibrahim, Alexander Pretschner et al.
Causal Models are increasingly suggested as a means to reason about the behavior of cyber-physical systems in socio-technical contexts. They allow us to analyze courses of events and reason about possible alternatives. Until now, however, such reasoning is confined to the technical domain and limited to single systems or at most groups of systems. The humans that are an integral part of any such socio-technical system are usually ignored or dealt with by "expert judgment". We show how a technical causal model can be extended with models of human behavior to cover the complexity and interplay between humans and technical systems. This integrated socio-technical causal model can then be used to reason not only about actions and decisions taken by the machine, but also about those taken by humans interacting with the system. In this paper we demonstrate the feasibility of merging causal models about machines with causal models about humans and illustrate the usefulness of this approach with a highly automated vehicle example.
CRNov 27, 2018
A Real-Time Remote IDS Testbed for Connected VehiclesValentin Zieglmeier, Severin Kacianka, Thomas Hutzelmann et al.
Connected vehicles are becoming commonplace. A constant connection between vehicles and a central server enables new features and services. This added connectivity raises the likelihood of exposure to attackers and risks unauthorized access. A possible countermeasure to this issue are intrusion detection systems (IDS), which aim at detecting these intrusions during or after their occurrence. The problem with IDS is the large variety of possible approaches with no sensible option for comparing them. Our contribution to this problem comprises the conceptualization and implementation of a testbed for an automotive real-world scenario. That amounts to a server-side IDS detecting intrusions into vehicles remotely. To verify the validity of our approach, we evaluate the testbed from multiple perspectives, including its fitness for purpose and the quality of the data it generates. Our evaluation shows that the testbed makes the effective assessment of various IDS possible. It solves multiple problems of existing approaches, including class imbalance. Additionally, it enables reproducibility and generating data of varying detection difficulties. This allows for comprehensive evaluation of real-time, remote IDS.
SEOct 23, 2018
Understanding and Formalizing Accountability for Cyber-Physical SystemsSeverin Kacianka, Alexander Pretschner
Accountability is the property of a system that enables the uncovering of causes for events and helps understand who or what is responsible for these events. Definitions and interpretations of accountability differ; however, they are typically expressed in natural language that obscures design decisions and the impact on the overall system. This paper presents a formal model to express the accountability properties of cyber-physical systems. To illustrate the usefulness of our approach, we demonstrate how three different interpretations of accountability can be expressed using the proposed model and describe the implementation implications through a case study. This formal model can be used to highlight context specific-elements of accountability mechanisms, define their capabilities, and express different notions of accountability. In addition, it makes design decisions explicit and facilitates discussion, analysis and comparison of different approaches.
CYAug 29, 2016
Towards a Unified Model of Accountability InfrastructuresSeverin Kacianka, Florian Kelbert, Alexander Pretschner
Accountability aims to provide explanations for why unwanted situations occurred, thus providing means to assign responsibility and liability. As such, accountability has slightly different meanings across the sciences. In computer science, our focus is on providing explanations for technical systems, in particular if they interact with their physical environment using sensors and actuators and may do serious harm. Accountability is relevant when considering safety, security and privacy properties and we realize that all these incarnations are facets of the same core idea. Hence, in this paper we motivate and propose a model for accountability infrastructures that is expressive enough to capture all of these domains. At its core, this model leverages formal causality models from the literature in order to provide a solid reasoning framework. We show how this model can be instantiated for several real-world use cases.