Ilmi Yoon

HC
4papers
7citations
Novelty51%
AI Score44

4 Papers

39.5HCMay 6
Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

Lana Do, Shasta Ihorn, Charity M. Pitcher-Cooper et al.

Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains an open question is how draft quality shapes the editing process. We investigate this through GenAD, an AD generation pipeline that incorporates accessibility guidelines and contextual video information, and RefineAD, an editing interface for human revisions. Human-AI contributions are measured across text, timing, and delivery. In a within-subjects study, we compared authoring from scratch against editing AI drafts of varying quality. GenAD drafts cut completion time by more than half and significantly reduced cognitive load. In contrast, baseline drafts generated from simple, unguided prompts offered only modest benefits, pointing to a minimum quality threshold for effectiveness. Qualitative findings suggest this threshold is content-dependent; as visual complexity increases, so does the quality needed from AI drafts. We propose this as a design principle: effective AI assistance should clear a quality threshold suited to the target content, rather than simply be present.

55.8HCMar 18
Toward Scalable Patient Safety Training: A Prototype for Root Cause Analysis Simulation With AI Virtual Avatars

Yuqi Hu, Qiwen Xiong, Zhenzhen Qin et al.

Patient safety training is essential for preparing healthcare professionals to identify, investigate, and prevent adverse events. However, conventional simulation-based approaches often require substantial faculty time, physical resources, and standardized facilitation. This paper presents a prototype AI-powered simulation platform designed to support more scalable patient safety training through root cause analysis (RCA). The system provides a Unity-based 3D simulation environment, which allows trainees to investigate an ICU adverse event by interviewing five virtual team members represented as AI-powered avatars. Each avatar is driven by a large language model (LLM) agent with role-specific knowledge and variable states of mind. Moreover, emotional text-to-speech and AI-supported facial and body animation enable more realistic and immersive interactions. After completing the simulation, trainees submit a written RCA report and receive rubric-guided formative and summative feedback automatically generated by an LLM-based assessment component. The prototype is built to support patient safety training for healthcare professionals, focusing on skills in communication, investigation, thinking, and analysis, with low recurring instructional burden. We describe the design of the platform, its core technical components, and an RCA case based on a published ICU scenario. This work demonstrates the feasibility of integrating generative AI into immersive simulation for scalable patient safety education.

HCFeb 1
How well can VLMs rate audio descriptions: A multi-dimensional quantitative assessment framework

Lana Do, Gio Jung, Juvenal Francisco Barajas et al.

Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision audiences are excluded. While crowdsourced platforms and vision-language-models (VLMs) expand AD production, quality is rarely checked systematically. Existing evaluations rely on NLP metrics and short-clip guidelines, leaving questions about what constitutes quality for full-length content and how to assess it at scale. To address these questions, we first developed a multi-dimensional assessment framework for uninterrupted, full-length video, grounded in professional guidelines and refined by accessibility specialists. Second, we integrated this framework into a comprehensive methodological workflow, utilizing Item Response Theory, to assess the proficiency of VLM and human raters against expert-established ground truth. Findings suggest that while VLMs can approximate ground-truth ratings with high alignment, their reasoning was found to be less reliable and actionable than that of human respondents. These insights show the potential of hybrid evaluation systems that leverage VLMs alongside human oversight, offering a path towards scalable AD quality control.

HCNov 7, 2021
NarrationBot and InfoBot: A Hybrid System for Automated Video Description

Shasta Ihorn, Yue-Ting Siu, Aditya Bodi et al.

Video accessibility is crucial for blind and low vision users for equitable engagements in education, employment, and entertainment. Despite the availability of professional and amateur services and tools, most human-generated descriptions are expensive and time consuming. Moreover, the rate of human-generated descriptions cannot match the speed of video production. To overcome the increasing gaps in video accessibility, we developed a hybrid system of two tools to 1) automatically generate descriptions for videos and 2) provide answers or additional descriptions in response to user queries on a video. Results from a mixed-methods study with 26 blind and low vision individuals show that our system significantly improved user comprehension and enjoyment of selected videos when both tools were used in tandem. In addition, participants reported no significant difference in their ability to understand videos when presented with autogenerated descriptions versus human-revised autogenerated descriptions. Our results demonstrate user enthusiasm about the developed system and its promise for providing customized access to videos. We discuss the limitations of the current work and provide recommendations for the future development of automated video description tools.