60.3HCMay 29
Computer-Aided Tagging on Wikimedia Commons: Designing for Human-AI Collaboration in Open Knowledge WorkYihan Yu, David W. McDonald
This study investigates Wikimedia Commons contributors' lived experiences with the Computer-Aided Tagging (CAT) tool, an AI-assisted image tagging system designed to improve Commons' discoverability, searchability, accessibility, and multilingual support. Using a qualitative analysis of 595 CAT-related community comments from 11 wiki pages and 16 in-depth interviews, we identify seven key issues that contributed to CAT's mixed reception and eventual deactivation. We also offer community-informed suggestions for improving the tool. We reflect on the implications for designing human-AI collaboration on Commons and for developing AI-assisted tools that support open knowledge work. This work contributes to HCI and CSCW research by extending the understanding of human-AI collaboration beyond Anglophone, text-centric, corporate platforms.
66.5HCMar 23
Not Another EHR: Reimagining Physician Information Needs with Generative AI TechnologyRuican Zhong, Jiachen Li, Gary Hsieh et al. · uw
Electronic health records (EHRs) have improved data accessibility but have also introduced cognitive burden for physicians, given the sheer volume and complexity of the data involved. Advances in large language models (LLMs) create new opportunities to rethink how clinicians interact with medical data through dynamic, adaptive interfaces. In this position paper, we explore how generative AI can support physicians' information needs by enabling more dynamic interactions with patient data. Through semi-structured interviews with internal physicians at Microsoft, we identify key challenges in data navigation and synthesis, and characterize clinicians' information needs during diagnostic workflows. We further examine how physicians conceptualize AI can help their work process and how these mental models shape expectations for interaction and trust. Based on these insights, we discuss design considerations for generative user interfaces that support clinician-centered workflows.
HCJun 14, 2025
Levels of Autonomy for AI AgentsK. J. Kevin Feng, David W. McDonald, Amy X. Zhang
Autonomy is a double-edged sword for AI agents, simultaneously unlocking transformative possibilities and serious risks. How can agent developers calibrate the appropriate levels of autonomy at which their agents should operate? We argue that an agent's level of autonomy can be treated as a deliberate design decision, separate from its capability and operational environment. In this work, we define five levels of escalating agent autonomy, characterized by the roles a user can take when interacting with an agent: operator, collaborator, consultant, approver, and observer. Within each level, we describe the ways by which a user can exert control over the agent and open questions for how to design the nature of user-agent interaction. We then highlight a potential application of our framework towards AI autonomy certificates to govern agent behavior in single- and multi-agent systems. We conclude by proposing early ideas for evaluating agents' autonomy. Our work aims to contribute meaningful, practical steps towards responsibly deployed and useful AI agents in the real world.
48.1HCApr 12
Making Sense of the Weather, Together: Collaborative Sensemaking in Severe Weather LivestreamsJulie A. Vera, Mark Zachry, David W. McDonald
This paper examines collaborative sensemaking during severe weather events through the emerging phenomenon of "weatherfluencers" or content creators who livestream meteorological interpretation on platforms like YouTube. Drawing from sensemaking theory, crisis informatics, and platform studies, we analyze how these creators navigate the sociotechnical dynamics of interpreting severe weather in real time with distributed audiences. Through critical incident analysis of 13 Particularly Dangerous Situation (PDS) storm warnings across three prominent weatherfluencers, we identify three key practices: multi-source information triangulation, temporal bridging techniques, and platform-specific adaptations that transform entertainment interfaces into safety-critical communication channels. Our analysis shows how these practices challenge existing models of crisis communication by integrating distributed expertise, collapsing temporal frames, and reconfiguring platform affordances. This research contributes to understanding how informal emergency communicators mediate between institutional alerting systems and public needs, and how visual, multimodal crisis communication differs from text-centered approaches.
HCJul 3, 2025
Synthetic Heuristic Evaluation: A Comparison between AI- and Human-Powered Usability EvaluationRuican Zhong, David W. McDonald, Gary Hsieh · uw
Usability evaluation is crucial in human-centered design but can be costly, requiring expert time and user compensation. In this work, we developed a method for synthetic heuristic evaluation using multimodal LLMs' ability to analyze images and provide design feedback. Comparing our synthetic evaluations to those by experienced UX practitioners across two apps, we found our evaluation identified 73% and 77% of usability issues, which exceeded the performance of 5 experienced human evaluators (57% and 63%). Compared to human evaluators, the synthetic evaluation's performance maintained consistent performance across tasks and excelled in detecting layout issues, highlighting potential attentional and perceptual strengths of synthetic evaluation. However, synthetic evaluation struggled with recognizing some UI components and design conventions, as well as identifying across screen violations. Additionally, testing synthetic evaluations over time and accounts revealed stable performance. Overall, our work highlights the performance differences between human and LLM-driven evaluations, informing the design of synthetic heuristic evaluations.
HCDec 30, 2018
Ease on Down the Code: Complex Collaborative Qualitative Coding Simplified with 'Code Wizard'Abbas Ganji, Mania Orand, David W. McDonald
This paper describes the design and development of a preliminary qualitative coding tool as well as a method to improve the process of achieving inter-coder reliability (ICR) in small teams. Software applications that support qualitative coding do not sufficiently assist collaboration among coders and overlook some fundamental issues related to ICR. We propose a new dimension of collaborative coding called "coders' certainty" and demonstrate its ability to illustrate valuable code disagreements that are missing from existing approaches. Through a case study, we describe the utility of our tool, Code Wizard, and how it helped a group of researchers effectively collaborate to code naturalistic observation data. We report the valuable lessons we learned from the development of our tool and method: (1) identifying coders' certainty constitutes an important part of determining the quality of data analysis and facilitates identifying overlapping and ambiguous codes, (2) making the details of coding process visible helps streamline the coding process and leads to a sense of ownership of the research results, and (3) there is valuable information hidden in coding disagreements that can be leveraged for improving the process of data analysis.
CYApr 13, 2016
Dissecting a Social Botnet: Growth, Content and Influence in TwitterNorah Abokhodair, Daisy Yoo, David W. McDonald
Social botnets have become an important phenomenon on social media. There are many ways in which social bots can disrupt or influence online discourse, such as, spam hashtags, scam twitter users, and astroturfing. In this paper we considered one specific social botnet in Twitter to understand how it grows over time, how the content of tweets by the social botnet differ from regular users in the same dataset, and lastly, how the social botnet may have influenced the relevant discussions. Our analysis is based on a qualitative coding for approximately 3000 tweets in Arabic and English from the Syrian social bot that was active for 35 weeks on Twitter before it was shutdown. We find that the growth, behavior and content of this particular botnet did not specifically align with common conceptions of botnets. Further we identify interesting aspects of the botnet that distinguish it from regular users.