CLOct 25, 2023
Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical ErrorsNikita Mehandru, Sweta Agrawal, Yimin Xiao et al.
A major challenge in the practical use of Machine Translation (MT) is that users lack guidance to make informed decisions about when to rely on outputs. Progress in quality estimation research provides techniques to automatically assess MT quality, but these techniques have primarily been evaluated in vitro by comparison against human judgments outside of a specific context of use. This paper evaluates quality estimation feedback in vivo with a human study simulating decision-making in high-stakes medical settings. Using Emergency Department discharge instructions, we study how interventions based on quality estimation versus backtranslation assist physicians in deciding whether to show MT outputs to a patient. We find that quality estimation improves appropriate reliance on MT, but backtranslation helps physicians detect more clinically harmful errors that QE alone often misses.
HCFeb 25, 2025
Comparing Native and Non-native English Speakers' Behaviors in Collaborative Writing through Visual AnalyticsYuexi Chen, Yimin Xiao, Kazi Tasnim Zinat et al.
Understanding collaborative writing dynamics between native speakers (NS) and non-native speakers (NNS) is critical for enhancing collaboration quality and team inclusivity. In this paper, we partnered with communication researchers to develop visual analytics solutions for comparing NS and NNS behaviors in 162 writing sessions across 27 teams. The primary challenges in analyzing writing behaviors are data complexity and the uncertainties introduced by automated methods. In response, we present \textsc{COALA}, a novel visual analytics tool that improves model interpretability by displaying uncertainties in author clusters, generating behavior summaries using large language models, and visualizing writing-related actions at multiple granularities. We validated the effectiveness of \textsc{COALA} through user studies with domain experts (N=2+2) and researchers with relevant experience (N=8). We present the insights discovered by participants using \textsc{COALA}, suggest features for future AI-assisted collaborative writing tools, and discuss the broader implications for analyzing collaborative processes beyond writing.
CLOct 11, 2025
Toward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect TranslationsYimin Xiao, Yongle Zhang, Dayeon Ki et al.
As Machine Translation (MT) becomes increasingly commonplace, understanding how the general public perceives and relies on imperfect MT is crucial for contextualizing MT research in real-world applications. We present a human study conducted in a public museum (n=452), investigating how fluency and adequacy errors impact bilingual and non-bilingual users' reliance on MT during casual use. Our findings reveal that non-bilingual users often over-rely on MT due to a lack of evaluation strategies and alternatives, while experiencing the impact of errors can prompt users to reassess future reliance. This highlights the need for MT evaluation and NLP explanation techniques to promote not only MT quality, but also MT literacy among its users.
CLMay 31, 2025
The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content ModerationYuhang Zhou, Yimin Xiao, Wei Ai et al.
Social media platforms have become central to modern communication, yet they also harbor offensive content that challenges platform safety and inclusivity. While prior research has primarily focused on textual indicators of offense, the role of emojis, ubiquitous visual elements in online discourse, remains underexplored. Emojis, despite being rarely offensive in isolation, can acquire harmful meanings through symbolic associations, sarcasm, and contextual misuse. In this work, we systematically examine emoji contributions to offensive Twitter messages, analyzing their distribution across offense categories and how users exploit emoji ambiguity. To address this, we propose an LLM-powered, multi-step moderation pipeline that selectively replaces harmful emojis while preserving the tweet's semantic intent. Human evaluations confirm our approach effectively reduces perceived offensiveness without sacrificing meaning. Our analysis also reveals heterogeneous effects across offense types, offering nuanced insights for online communication and emoji moderation.