HCFeb 16, 2023
Human-Centered Responsible Artificial Intelligence: Current & Future TrendsMohammad Tahaei, Marios Constantinides, Daniele Quercia et al.
In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at developing AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this special interest group, we aim to bring together researchers from academia and industry interested in these topics to map current and future research trends to advance this important area of research by fostering collaboration and sharing ideas.
HCFeb 10, 2023
A Systematic Literature Review of Human-Centered, Ethical, and Responsible AIMohammad Tahaei, Marios Constantinides, Daniele Quercia et al.
As Artificial Intelligence (AI) continues to advance rapidly, it becomes increasingly important to consider AI's ethical and societal implications. In this paper, we present a bottom-up mapping of the current state of research at the intersection of Human-Centered AI, Ethical, and Responsible AI (HCER-AI) by thematically reviewing and analyzing 164 research papers from leading conferences in ethical, social, and human factors of AI: AIES, CHI, CSCW, and FAccT. The ongoing research in HCER-AI places emphasis on governance, fairness, and explainability. These conferences, however, concentrate on specific themes rather than encompassing all aspects. While AIES has fewer papers on HCER-AI, it emphasizes governance and rarely publishes papers about privacy, security, and human flourishing. FAccT publishes more on governance and lacks papers on privacy, security, and human flourishing. CHI and CSCW, as more established conferences, have a broader research portfolio. We find that the current emphasis on governance and fairness in AI research may not adequately address the potential unforeseen and unknown implications of AI. Therefore, we recommend that future research should expand its scope and diversify resources to prepare for these potential consequences. This could involve exploring additional areas such as privacy, security, human flourishing, and explainability.
HCJan 13, 2023
Toward General Design Principles for Generative AI ApplicationsJustin D. Weisz, Michael Muller, Jessica He et al.
Generative AI technologies are growing in power, utility, and use. As generative technologies are being incorporated into mainstream applications, there is a need for guidance on how to design those applications to foster productive and safe use. Based on recent research on human-AI co-creation within the HCI and AI communities, we present a set of seven principles for the design of generative AI applications. These principles are grounded in an environment of generative variability. Six principles are focused on designing for characteristics of generative AI: multiple outcomes & imperfection; exploration & control; and mental models & explanations. In addition, we urge designers to design against potential harms that may be caused by a generative model's hazardous output, misuse, or potential for human displacement. We anticipate these principles to usefully inform design decisions made in the creation of novel human-AI applications, and we invite the community to apply, revise, and extend these principles to their own work.
CYJan 13, 2023
A Case Study in Engineering a Conversational Programming Assistant's PersonaSteven I. Ross, Michael Muller, Fernando Martinez et al.
The Programmer's Assistant is an experimental prototype software development environment that integrates a chatbot with a code editor. Conversational capability was achieved by using an existing code-fluent Large Language Model and providing it with a prompt that establishes a conversational interaction pattern, a set of conventions, and a style of interaction appropriate for the application. A discussion of the evolution of the prompt provides a case study in how to coax an existing foundation model to behave in a desirable manner for a particular application.
CYJul 26, 2024
Surveys Considered Harmful? Reflecting on the Use of Surveys in AI Research, Development, and GovernanceMohammmad Tahaei, Daricia Wilkinson, Alisa Frik et al.
Calls for engagement with the public in Artificial Intelligence (AI) research, development, and governance are increasing, leading to the use of surveys to capture people's values, perceptions, and experiences related to AI. In this paper, we critically examine the state of human participant surveys associated with these topics. Through both a reflexive analysis of a survey pilot spanning six countries and a systematic literature review of 44 papers featuring public surveys related to AI, we explore prominent perspectives and methodological nuances associated with surveys to date. We find that public surveys on AI topics are vulnerable to specific Western knowledge, values, and assumptions in their design, including in their positioning of ethical concepts and societal values, lack sufficient critical discourse surrounding deployment strategies, and demonstrate inconsistent forms of transparency in their reporting. Based on our findings, we distill provocations and heuristic questions for our community, to recognize the limitations of surveys for meeting the goals of engagement, and to cultivate shared principles to design, deploy, and interpret surveys cautiously and responsibly.
HCJan 25, 2024
Design Principles for Generative AI ApplicationsJustin D. Weisz, Jessica He, Michael Muller et al.
Generative AI applications present unique design challenges. As generative AI technologies are increasingly being incorporated into mainstream applications, there is an urgent need for guidance on how to design user experiences that foster effective and safe use. We present six principles for the design of generative AI applications that address unique characteristics of generative AI UX and offer new interpretations and extensions of known issues in the design of AI applications. Each principle is coupled with a set of design strategies for implementing that principle via UX capabilities or through the design process. The principles and strategies were developed through an iterative process involving literature review, feedback from design practitioners, validation against real-world generative AI applications, and incorporation into the design process of two generative AI applications. We anticipate the principles to usefully inform the design of generative AI applications by driving actionable design recommendations.
HCFeb 15, 2022
Better Together? An Evaluation of AI-Supported Code TranslationJustin D. Weisz, Michael Muller, Steven I. Ross et al.
Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.
HCFeb 10, 2022
Investigating Explainability of Generative AI for Code through Scenario-based DesignJiao Sun, Q. Vera Liao, Michael Muller et al.
What does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in helping people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using scenario-based design and question-driven XAI design approaches, we explore users' explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code auto-completion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit users' explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.
CLOct 11, 2021
Using Document Similarity Methods to create Parallel Datasets for Code TranslationMayank Agarwal, Kartik Talamadupula, Fernando Martinez et al.
Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, supervised techniques have only been applied to a limited set of popular programming languages. To bypass this limitation, unsupervised neural machine translation techniques have been proposed to learn code translation using only monolingual corpora. In this work, we propose to use document similarity methods to create noisy parallel datasets of code, thus enabling supervised techniques to be applied for automated code translation without having to rely on the availability or expensive curation of parallel code datasets. We explore the noise tolerance of models trained on such automatically-created datasets and show that these models perform comparably to models trained on ground truth for reasonable levels of noise. Finally, we exhibit the practical utility of the proposed method by creating parallel datasets for languages beyond the ones explored in prior work, thus expanding the set of programming languages for automated code translation.
HCOct 3, 2021
Organizational Distance Also Matters: How Organizational Distance Among Industrial Research Teams Affect Their Research ProductivityDakuo Wang, Michael Muller, Qian Yang et al.
Geographically distributed teams often face challenges in coordination and collaboration, lowering their productivity. Understanding the relationship between team dispersion and productivity is critical for supporting such teams. Extensive prior research has studied these relations in lab settings or using qualitative measures. This paper extends prior work by contributing an empirical case study in a real-world organization, using quantitative measures. We studied 117 new research project teams from the same discipline within an industrial research lab for 6 months. During this time, all teams shared one goal: submitting research papers to the same target conference. We analyzed these teams' dispersion-related characteristics as well as team productivity. Interestingly, we found little statistical evidence that geographic and time differences relate to team productivity. However, organizational and functional distances are predictive of the productivity of the dispersed teams we studied. We discuss the open research questions these findings revealed and their implications for future research.
HCJul 28, 2021
The Who in XAI: How AI Background Shapes Perceptions of AI ExplanationsUpol Ehsan, Samir Passi, Q. Vera Liao et al.
Explainability of AI systems is critical for users to take informed actions. Understanding "who" opens the black-box of AI is just as important as opening it. We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations. Quantitatively, we share user perceptions along five dimensions. Qualitatively, we describe how AI background can influence interpretations, elucidating the differences through lenses of appropriation and cognitive heuristics. We find that (1) both groups showed unwarranted faith in numbers for different reasons and (2) each group found value in different explanations beyond their intended design. Carrying critical implications for the field of XAI, our findings showcase how AI generated explanations can have negative consequences despite best intentions and how that could lead to harmful manipulation of trust. We propose design interventions to mitigate them.
HCApr 9, 2021
Increasing the Speed and Accuracy of Data LabelingThrough an AI Assisted InterfaceMichael Desmond, Zahra Ashktorab, Michelle Brachman et al.
Labeling data is an important step in the supervised machine learning lifecycle. It is a laborious human activity comprised of repeated decision making: the human labeler decides which of several potential labels to apply to each example. Prior work has shown that providing AI assistance can improve the accuracy of binary decision tasks. However, the role of AI assistance in more complex data-labeling scenarios with a larger set of labels has not yet been explored. We designed an AI labeling assistant that uses a semi-supervised learning algorithm to predict the most probable labels for each example. We leverage these predictions to provide assistance in two ways: (i) providing a label recommendation and (ii) reducing the labeler's decision space by focusing their attention on only the most probable labels. We conducted a user study (n=54) to evaluate an AI-assisted interface for data labeling in this context. Our results highlight that the AI assistance improves both labeler accuracy and speed, especially when the labeler finds the correct label in the reduced label space. We discuss findings related to the presentation of AI assistance and design implications for intelligent labeling interfaces.
HCApr 8, 2021
Perfection Not Required? Human-AI Partnerships in Code TranslationJustin D. Weisz, Michael Muller, Stephanie Houde et al.
Generative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, such as compilation or logical errors. We examine the extent to which software engineers would tolerate such imperfections and explore ways to aid the detection and correction of those errors. Using a design scenario approach, we interviewed 11 software engineers to understand their reactions to the use of an NMT model in the context of application modernization, focusing on the task of translating source code from one language to another. Our three-stage scenario sparked discussions about the utility and desirability of working with an imperfect AI system, how acceptance of that system's outputs would be established, and future opportunities for generative AI in application modernization. Our study highlights how UI features such as confidence highlighting and alternate translations help software engineers work with and better understand generative NMT models.
HCFeb 24, 2021
Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational NotebooksApril Yi Wang, Dakuo Wang, Jaimie Drozdal et al.
Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants' satisfaction with their computational notebook.
CYJan 13, 2021
How AI Developers Overcome Communication Challenges in a Multidisciplinary Team: A Case StudyDavid Piorkowski, Soya Park, April Yi Wang et al.
The development of AI applications is a multidisciplinary effort, involving multiple roles collaborating with the AI developers, an umbrella term we use to include data scientists and other AI-adjacent roles on the same team. During these collaborations, there is a knowledge mismatch between AI developers, who are skilled in data science, and external stakeholders who are typically not. This difference leads to communication gaps, and the onus falls on AI developers to explain data science concepts to their collaborators. In this paper, we report on a study including analyses of both interviews with AI developers and artifacts they produced for communication. Using the analytic lens of shared mental models, we report on the types of communication gaps that AI developers face, how AI developers communicate across disciplinary and organizational boundaries, and how they simultaneously manage issues regarding trust and expectations.
HCJan 12, 2021
Expanding Explainability: Towards Social Transparency in AI systemsUpol Ehsan, Q. Vera Liao, Michael Muller et al.
As AI-powered systems increasingly mediate consequential decision-making, their explainability is critical for end-users to take informed and accountable actions. Explanations in human-human interactions are socially-situated. AI systems are often socio-organizationally embedded. However, Explainable AI (XAI) approaches have been predominantly algorithm-centered. We take a developmental step towards socially-situated XAI by introducing and exploring Social Transparency (ST), a sociotechnically informed perspective that incorporates the socio-organizational context into explaining AI-mediated decision-making. To explore ST conceptually, we conducted interviews with 29 AI users and practitioners grounded in a speculative design scenario. We suggested constitutive design elements of ST and developed a conceptual framework to unpack ST's effect and implications at the technical, decision-making, and organizational level. The framework showcases how ST can potentially calibrate trust in AI, improve decision-making, facilitate organizational collective actions, and cultivate holistic explainability. Our work contributes to the discourse of Human-Centered XAI by expanding the design space of XAI.
LGJan 7, 2021
How Much Automation Does a Data Scientist Want?Dakuo Wang, Q. Vera Liao, Yunfeng Zhang et al.
Data science and machine learning (DS/ML) are at the heart of the recent advancements of many Artificial Intelligence (AI) applications. There is an active research thread in AI, \autoai, that aims to develop systems for automating end-to-end the DS/ML Lifecycle. However, do DS and ML workers really want to automate their DS/ML workflow? To answer this question, we first synthesize a human-centered AutoML framework with 6 User Role/Personas, 10 Stages and 43 Sub-Tasks, 5 Levels of Automation, and 5 Types of Explanation, through reviewing research literature and marketing reports. Secondly, we use the framework to guide the design of an online survey study with 217 DS/ML workers who had varying degrees of experience, and different user roles "matching" to our 6 roles/personas. We found that different user personas participated in distinct stages of the lifecycle -- but not all stages. Their desired levels of automation and types of explanation for AutoML also varied significantly depending on the DS/ML stage and the user persona. Based on the survey results, we argue there is no rationale from user needs for complete automation of the end-to-end DS/ML lifecycle. We propose new next steps for user-controlled DS/ML automation.
SEDec 4, 2020
Quality Estimation & Interpretability for Code TranslationMayank Agarwal, Kartik Talamadupula, Stephanie Houde et al.
Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the translations; and consequently ascribe some measure of interpretability to the model's choices. In this paper, we attempt to estimate the quality of source code translations built on top of the TransCoder model. We consider the code translation task as an analog of machine translation (MT) for natural languages, with some added caveats. We present our main motivation from a user study built around code translation; and present a technique that correlates the confidences generated by that model to lint errors in the translated code. We conclude with some observations on these correlations, and some ideas for future work.
HCJan 18, 2020
How do Data Science Workers Collaborate? Roles, Workflows, and ToolsAmy X. Zhang, Michael Muller, Dakuo Wang
Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions.
LGJan 17, 2020
Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning SystemsJaimie Drozdal, Justin Weisz, Dakuo Wang et al.
We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists' trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies -- qualitative interviews, a controlled experiment, and a card-sorting task -- to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.
LGDec 13, 2019
AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel CoordinatesDaniel Karl I. Weidele, Justin D. Weisz, Eno Oduor et al.
Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate model results. Thus, users often do not understand the process, neither do they trust the outputs. In this short paper, we provide a first user evaluation by 10 data scientists of an experimental system, AutoAIViz, that aims to visualize AutoAI's model generation process. We find that the proposed system helps users to complete the data science tasks, and increases their understanding, toward the goal of increasing trust in the AutoAI system.
HCDec 13, 2019
Enabling Value Sensitive AI Systems through Participatory Design FictionsQ. Vera Liao, Michael Muller
Two general routes have been followed to develop artificial agents that are sensitive to human values---a top-down approach to encode values into the agents, and a bottom-up approach to learn from human actions, whether from real-world interactions or stories. Although both approaches have made exciting scientific progress, they may face challenges when applied to the current development practices of AI systems, which require the under-standing of the specific domains and specific stakeholders involved. In this work, we bring together perspectives from the human-computer interaction (HCI) community, where designing technologies sensitive to user values has been a longstanding focus. We highlight several well-established areas focusing on developing empirical methods for inquiring user values. Based on these methods, we propose participatory design fictions to study user values involved in AI systems and present preliminary results from a case study. With this paper, we invite the consideration of user-centered value inquiry and value learning.
CYSep 8, 2019
How Data Scientists Work Together With Domain Experts in Scientific Collaborations: To Find The Right Answer Or To Ask The Right Question?Yaoli Mao, Dakuo Wang, Michael Muller et al.
In recent years there has been an increasing trend in which data scientists and domain experts work together to tackle complex scientific questions. However, such collaborations often face challenges. In this paper, we aim to decipher this collaboration complexity through a semi-structured interview study with 22 interviewees from teams of bio-medical scientists collaborating with data scientists. In the analysis, we adopt the Olsons' four-dimensions framework proposed in Distance Matters to code interview transcripts. Our findings suggest that besides the glitches in the collaboration readiness, technology readiness, and coupling of work dimensions, the tensions that exist in the common ground building process influence the collaboration outcomes, and then persist in the actual collaboration process. In contrast to prior works' general account of building a high level of common ground, the breakdowns of content common ground together with the strengthen of process common ground in this process is more beneficial for scientific discovery. We discuss why that is and what the design suggestions are, and conclude the paper with future directions and limitations.
HCSep 5, 2019
Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AIDakuo Wang, Justin D. Weisz, Michael Muller et al.
The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.