CLJan 20Code
PRiSM: Benchmarking Phone Realization in Speech ModelsShikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim et al.
Phone recognition (PR) serves as the atomic interface for language-agnostic modeling for cross-lingual speech processing and phonetic analysis. Despite prolonged efforts in developing PR systems, current evaluations only measure surface-level transcription accuracy. We introduce PRiSM, the first open-source benchmark designed to expose blind spots in phonetic perception through intrinsic and extrinsic evaluation of PR systems. PRiSM standardizes transcription-based evaluation and assesses downstream utility in clinical, educational, and multilingual settings with transcription and representation probes. We find that diverse language exposure during training is key to PR performance, encoder-CTC models are the most stable, and specialized PR models still outperform Large Audio Language Models. PRiSM releases code, recipes, and datasets to move the field toward multilingual speech models with robust phonetic ability: https://github.com/changelinglab/prism.
ROMar 19Code
Introducing M: A Modular, Modifiable Social RobotVictor Nikhil Antony, Zhili Gong, Yoonjae Kim et al.
We present M, an open-source, low-cost social robot platform designed to reduce platform friction that slows social robotics research by making robots easier to reproduce, modify, and deploy in real-world settings. M combines a modular mechanical design, multimodal sensing, and expressive yet mechanically simple actuation architecture with a ROS2-native software package that cleanly separates perception, expression control, and data management. The platform includes a simulation environment with interface equivalence to hardware to support rapid sim-to-real transfer of interaction behaviors. We demonstrate extensibility through additional sensing/actuation modules and provide example interaction templates for storytelling and two-way conversational coaching. Finally, we report real-world use in participatory design and week-long in-home deployments, showing how M can serve as a practical foundation for longitudinal, reproducible social robotics research.
HCMar 9
From Daily Song to Daily Self: Supporting Reflective Songwriting of Deaf and Hard-of-Hearing Individuals through Generative Music AIYoujin Choi, Jinyoung Yoo, Jaeyoung Moon et al.
The rapid advancement of generative AI (GenAI) is expanding access to songwriting, offering a new medium of self-expression for Deaf and Hard-of-Hearing (DHH) individuals. However, emerging technologies that support DHH individuals in expressing themselves through music have largely been evaluated in single-session settings and often fall short in helping users unfamiliar with songwriting convey personal narratives or sustain engagement over time. This paper explores songwriting as an extended, music-based journaling practice that supports sustained emotional reflection over multiple sessions. We introduce SoulNote, a GenAI system enabling DHH to engage in iterative songwriting. Grounded in user-centered design, including a design workshop, a preliminary study, and a multi-session diary study, our findings show that ongoing songwriting with \textit{SoulNote} facilitated emotional growth across three dimensions: self-insight, emotion regulation, and \revised{everyday attitudes toward emotions and self-care}. Overall, this work demonstrates how GenAI can support marginalized communities by transforming creative expression into a daily practice of self-discovery and reflection.