HCMay 1, 2022
Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity AnnotationNitesh Goyal, Ian Kivlichan, Rachel Rosen et al.
Machine learning models are commonly used to detect toxicity in online conversations. These models are trained on datasets annotated by human raters. We explore how raters' self-described identities impact how they annotate toxicity in online comments. We first define the concept of specialized rater pools: rater pools formed based on raters' self-described identities, rather than at random. We formed three such rater pools for this study--specialized rater pools of raters from the U.S. who identify as African American, LGBTQ, and those who identify as neither. Each of these rater pools annotated the same set of comments, which contains many references to these identity groups. We found that rater identity is a statistically significant factor in how raters will annotate toxicity for identity-related annotations. Using preliminary content analysis, we examined the comments with the most disagreement between rater pools and found nuanced differences in the toxicity annotations. Next, we trained models on the annotations from each of the different rater pools, and compared the scores of these models on comments from several test sets. Finally, we discuss how using raters that self-identify with the subjects of comments can create more inclusive machine learning models, and provide more nuanced ratings than those by random raters.
AIJul 14, 2023
`It is currently hodgepodge'': Examining AI/ML Practitioners' Challenges during Co-production of Responsible AI ValuesRama Adithya Varanasi, Nitesh Goyal
Recently, the AI/ML research community has indicated an urgent need to establish Responsible AI (RAI) values and practices as part of the AI/ML lifecycle. Several organizations and communities are responding to this call by sharing RAI guidelines. However, there are gaps in awareness, deliberation, and execution of such practices for multi-disciplinary ML practitioners. This work contributes to the discussion by unpacking co-production challenges faced by practitioners as they align their RAI values. We interviewed 23 individuals, across 10 organizations, tasked to ship AI/ML based products while upholding RAI norms and found that both top-down and bottom-up institutional structures create burden for different roles preventing them from upholding RAI values, a challenge that is further exacerbated when executing conflicted values. We share multiple value levers used as strategies by the practitioners to resolve their challenges. We end our paper with recommendations for inclusive and equitable RAI value-practices, creating supportive organizational structures and opportunities to further aid practitioners.
HCOct 24, 2023
ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into PrinciplesSavvas Petridis, Ben Wedin, James Wexler et al.
Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.
HCMar 14
"It Became My Buddy, But I'm Not Afraid to Disagree": A Multi-Session Study of UX Evaluators Collaborating with Conversational AI AssistantsEmily Kuang, Ehsan Jahangirzadeh Soure, Luyao Shen et al.
AI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.
HCMar 23
Surfacing and Applying Meaning: Supporting Hermeneutical Autonomy for LGBTQ+ People in TaiwanYi-Tong Chen, En-Kai Chang, Nanyi Bi et al.
After Taiwan's legalization of same-sex marriage in 2019, LGBTQ+ communities continue to face hostility on social media. Using the lens of hermeneutical injustice and autonomy, we examine how technological conditions affect LGBTQ+ individuals' identity exploration, narrative seeking, and community resilience. We conducted a multi-stage study with Taiwanese LGBTQ+ individuals, including in-depth interviews, participatory design workshops, and evaluation sessions. Participants described fragile yet creative strategies such as seeking validation in online interactions, reframing hostile content through theory, and relying on allies. Building on these insights, we designed and evaluated a retrieval-augmented, LLM-powered chatbot with four modes of interaction: reflection, validation, discussion, and allyship. Findings show that the system fosters hermeneutical autonomy by helping participants reframe hostile narratives, validate lived experiences, and scaffold identity exploration, while reducing the hermeneutical labor of navigating social media hostility. We conclude by outlining design implications for AI systems that advance hermeneutical autonomy through fluid self-representation, contextualized dialogue, and inclusive community participation.
AIApr 4, 2024
Designing for Human-Agent Alignment: Understanding what humans want from their agentsNitesh Goyal, Minsuk Chang, Michael Terry
Our ability to build autonomous agents that leverage Generative AI continues to increase by the day. As builders and users of such agents it is unclear what parameters we need to align on before the agents start performing tasks on our behalf. To discover these parameters, we ran a qualitative empirical research study about designing agents that can negotiate during a fictional yet relatable task of selling a camera online. We found that for an agent to perform the task successfully, humans/users and agents need to align over 6 dimensions: 1) Knowledge Schema Alignment 2) Autonomy and Agency Alignment 3) Operational Alignment and Training 4) Reputational Heuristics Alignment 5) Ethics Alignment and 6) Human Engagement Alignment. These empirical findings expand previous work related to process and specification alignment and the need for values and safety in Human-AI interactions. Subsequently we discuss three design directions for designers who are imagining a world filled with Human-Agent collaborations.
HCFeb 22, 2022
"You have to prove the threat is real": Understanding the needs of Female Journalists and Activists to Document and Report Online HarassmentNitesh Goyal, Leslie Park, Lucy Vasserman
Online harassment is a major societal challenge that impacts multiple communities. Some members of community, like female journalists and activists, bear significantly higher impacts since their profession requires easy accessibility, transparency about their identity, and involves highlighting stories of injustice. Through a multi-phased qualitative research study involving a focus group and interviews with 27 female journalists and activists, we mapped the journey of a target who goes through harassment. We introduce PMCR framework, as a way to focus on needs for Prevention, Monitoring, Crisis and Recovery. We focused on Crisis and Recovery, and designed a tool to satisfy a target's needs related to documenting evidence of harassment during the crisis and creating reports that could be shared with support networks for recovery. Finally, we discuss users' feedback to this tool, highlighting needs for targets as they face the burden and offer recommendations to future designers and scholars on how to develop tools that can help targets manage their harassment.
HCMar 16, 2016
Effects of Sensemaking Translucence on Distributed Collaborative AnalysisNitesh Goyal, Susan R. Fussell
Collaborative sensemaking requires that analysts share their information and insights with each other, but this process of sharing runs the risks of prematurely focusing the investigation on specific suspects. To address this tension, we propose and test an interface for collaborative crime analysis that aims to make analysts more aware of their sensemaking processes. We compare our sensemaking translucence interface to a standard interface without special sensemaking features in a controlled laboratory study. We found that the sensemaking translucence interface significantly improved clue finding and crime solving performance, but that analysts rated the interface lower on subjective measures than the standard interface. We conclude that designing for distributed sensemaking requires balancing task performance vs. user experience and real-time information sharing vs. data accuracy.
HCNov 19, 2015
Designing for Collaborative Sensemaking: Using Expert & Non-Expert CrowdNitesh Goyal
Crime solving is a domain where solution discovery is often serendipitous. Unstructured mechanisms, like Reddit, for crime solving through crowds have failed so far. Mechanisms, collaborations, workflows, and micro-tasks necessary for successful crime solving might also vary across different crimes. Cognitively, while experts might have deeper domain knowledge, they might also fall prey to biased analysis. Non-experts, while lacking formal training, might instead offer non-conventional perspectives requiring direction. The analytical process is itself an iterative process of foraging and sensemaking. Users would explore to broaden solution space and narrow down to a solution iteratively until identifying the global maxima instead of local maxima. In this proposal, my research aims to design systems for enabling complex sensemaking tasks that require collaboration between remotely located non-expert crowds with expert crowds to compensate for their cognitive challenges and lack of training. This would require better understanding of the structure, workflow, and micro-tasks necessary for successful collaborations. This proposal builds upon previous work on collaborative sensemaking between remote partners in lab experiments and endeavors to scale it across multiple team members, with varying expertise levels.
HCNov 18, 2015
Designing for Collaborative Sensemaking: Leveraging Human Cognition For Complex TasksNitesh Goyal, Susan R. Fussell
My research aims to design systems for complex sensemaking by remotely located non-expert collaborators (crowds), to solve computationally hard problems like crimes.
HCJan 30, 2014
SPRING: speech and pronunciation improvement through games, for Hispanic childrenAnuj Tewari, Nitesh Goyal, Matthew K Chan et al.
Lack of proper English pronunciations is a major problem for immigrant population in developed countries like U.S. This poses various problems, including a barrier to entry into mainstream society. This paper presents a research study that explores the use of speech technologies merged with activity-based and arcade-based games to do pronunciation feedback for Hispanic children within the U.S. A 3-month long study with immigrant population in California was used to investigate and analyze the effectiveness of computer aided pronunciation feedback through games. In addition to quantitative findings that point to statistically significant gains in pronunciation quality, the paper also explores qualitative findings, interaction patterns and challenges faced by the researchers in dealing with this community. It also describes the issues involved in dealing with pronunciation as a competency.