HCApr 13
When Drawing Is Not Enough: Exploring Spontaneous Speech with Sketch for Intent Alignment in Multimodal LLMsWeiyan Shi, Dorien Herremans, Kenny Tsu Wei Choo
Early-stage design ideation often relies on rough sketches created under time pressure, leaving much of the designer's intent implicit. In practice, designers frequently speak while sketching, verbally articulating functional goals and ideas that are difficult to express visually. We introduce TalkSketchD, a sketch-while-speaking dataset that captures spontaneous speech temporally aligned with freehand sketches during early-stage toaster ideation. To examine the dataset's value, we conduct a sketch-to-image generation study comparing sketch-only inputs with sketches augmented by concurrent speech transcripts using multimodal large language models (MLLMs). Generated images are evaluated against designers' self-reported intent using a reasoning MLLM as a judge. Quantitative results show that incorporating spontaneous speech significantly improves judged intent alignment of generated design images across form, function, experience, and overall intent. These findings demonstrate that temporally aligned sketch-and-speech data can enhance MLLMs' ability to interpret user intent in early-stage design ideation.
HCApr 15
Acts of Configuration: Rethinking Provenance, Temporality and Legitimacy in Post-Mortem AgentsKellie Yu Hui Sim, Pin Sym Foong, Darryl Lim et al.
Work on persona-persistent post-mortem agents typically frames design around a life/death binary. This framing neglects a consequential yet under-theorised condition: when individuals remain alive but have impaired decisional capacity. Drawing on a multi-phase workshop in which participants trained and reflected on an AI agent for Advance Care Planning, we examined how people reason about agentic delegation post-capacity loss. Initially, participants favoured bounded agents grounded in first-party authorship and representational fidelity over autonomous or evolving stand-ins. However, temporality introduced novel ideas like adjacent use driven by persona persistence over functional expansion: agents should evolve while users retain capacity, remain static once capacity is lost, but somehow inform adjacent post-mortem uses. We discuss the implications of these findings and propose that the configuration of agents for post-capacity use reshapes our understanding of provenance, temporality, and legitimacy for post-mortem agents.
HCDec 12, 2025
Words to Describe What I'm Feeling: Exploring the Potential of AI Agents for High Subjectivity Decisions in Advance Care PlanningKellie Yu Hui Sim, Pin Sym Foong, Chenyu Zhao et al.
Loss of decisional capacity, coupled with the increasing absence of reliable human proxies, raises urgent questions about how individuals' values can be represented in Advance Care Planning (ACP). To probe this fraught design space of high-risk, high-subjectivity decision support, we built an experience prototype (\acpagent{}) and asked 15 participants in 4 workshops to train it to be their personal ACP proxy. We analysed their coping strategies and feature requests and mapped the results onto axes of agent autonomy and human control. Our findings show a surprising 86.7\% agreement with \acpagent{}, arguing for a potential new role of AI in ACP where agents act as personal advocates for individuals, building mutual intelligibility over time. We propose that the key areas of future risk that must be addressed are the moderation of users' expectations and designing accountability and oversight over agent deployment and cutoffs.
CVSep 18, 2024
Exploring Gaze Pattern Differences Between Autistic and Neurotypical Children: Clustering, Visualisation, and PredictionWeiyan Shi, Haihong Zhang, Wei Wang et al.
Autism Spectrum Disorder (ASD) affects children's social and communication abilities, with eye-tracking widely used to identify atypical gaze patterns. While unsupervised clustering can automate the creation of areas of interest for gaze feature extraction, the use of internal cluster validity indices, like Silhouette Coefficient, to distinguish gaze pattern differences between ASD and typically developing (TD) children remains underexplored. We explore whether internal cluster validity indices can distinguish ASD from TD children. Specifically, we apply seven clustering algorithms to gaze points and extract 63 internal cluster validity indices to reveal correlations with ASD diagnosis. Using these indices, we train predictive models for ASD diagnosis. Experiments on three datasets demonstrate high predictive accuracy (81\% AUC), validating the effectiveness of these indices.
CLMay 28, 2023Code
Evaluating GPT-3 Generated Explanations for Hateful Content ModerationHan Wang, Ming Shan Hee, Md Rabiul Awal et al.
Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convince a content moderator that a benign piece of content is hateful. In light of this, we propose an analytical framework for examining hate speech explanations and conducted an extensive survey on evaluating such explanations. Specifically, we prompted GPT-3 to generate explanations for both hateful and non-hateful content, and a survey was conducted with 2,400 unique respondents to evaluate the generated explanations. Our findings reveal that (1) human evaluators rated the GPT-generated explanations as high quality in terms of linguistic fluency, informativeness, persuasiveness, and logical soundness, (2) the persuasive nature of these explanations, however, varied depending on the prompting strategy employed, and (3) this persuasiveness may result in incorrect judgments about the hatefulness of the content. Our study underscores the need for caution in applying LLM-generated explanations for content moderation. Code and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval.
HCApr 15
"I'm Not Able to Be There for You": Emotional Labour, Responsibility, and AI in Peer SupportKellie Yu Hui Sim, Kenny Tsu Wei Choo
Peer support is increasingly positioned as a scalable response to gaps in mental health care, particularly in digitally mediated settings, yet what counts as peer support and how responsibility is distributed remain unevenly defined in practice. Drawing on interviews with peer supporters, we show how lived experience, moral commitment, and self-identification shape participation while blurring expectations around scope, authority, and accountability. Institutional ambiguity concentrates emotional labour, boundary-setting, and escalation of responsibility at the individual level, often without consistent organisational scaffolding. Participants evaluated AI not primarily through empathy or technical capability, but through how technologies redistribute risk, labour, and accountability within already fragile support roles. Building on these findings, we outline design futures for an AI-supported peer support ecosystem that foregrounds responsibility as a central design concern rather than treating AI as a mechanism of scale.
HCApr 15
Redistributing Voice and Responsibility: AI in Relationship-Centred CareKellie Yu Hui Sim, Kenny Tsu Wei Choo
Relationship-centred care (RCC) recognises that healthcare quality depends not only on outcomes, but on how voice, responsibility, and emotional labour are negotiated among patients, caregivers, and providers. As AI systems enter sensitive care contexts, they introduce a new participant into these negotiations. Drawing on empirical work in Advance Care Planning (ACP) and peer support, we argue that AI's primary impact in high-subjectivity domains is not optimisation but redistribution: it reorganises who speaks, who decides, and who bears moral responsibility. Across both settings, participants were less concerned with technical accuracy than with relational consequences: whether AI would appropriately represent their decision, reduce burden, or blur accountability, scaffold connection, or subtly displace it. We identify three relational dimensions: authority, temporality, and visibility, through which AI reshapes care relationships, and propose design provocations centred on relational legibility, bounded agency, responsibility traceability, and non-substitutive scaffolding.
CLMay 3, 2024
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of SingaporeRi Chi Ng, Nirmalendu Prakash, Ming Shan Hee et al.
To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts.
HCMar 19, 2025
Envisioning an AI-Enhanced Mental Health EcosystemKellie Yu Hui Sim, Kenny Tsu Wei Choo
The rapid advancement of Large Language Models (LLMs), reasoning models, and agentic AI approaches coincides with a growing global mental health crisis, where increasing demand has not translated into adequate access to professional support, particularly for underserved populations. This presents a unique opportunity for AI to complement human-led interventions, offering scalable and context-aware support while preserving human connection in this sensitive domain. We explore various AI applications in peer support, self-help interventions, proactive monitoring, and data-driven insights, using a human-centred approach that ensures AI supports rather than replaces human interaction. However, AI deployment in mental health fields presents challenges such as ethical concerns, transparency, privacy risks, and risks of over-reliance. We propose a hybrid ecosystem where where AI assists but does not replace human providers, emphasising responsible deployment and evaluation. We also present some of our early work and findings in several of these AI applications. Finally, we outline future research directions for refining AI-enhanced interventions while adhering to ethical and culturally sensitive guidelines.
HCApr 7
Foreign Domestic Workers' Perspectives on an LLM-Based Emotional Support tool for Caregiving BurdenShin Shoon Nicholas Teng, Kenny Tsu Wei Choo
Foreign Domestic Workers (FDWs) play a central role in home-based eldercare yet often experience substantial emotional caregiving burden shaped by linguistic barriers, social isolation, and limited access to support. While caregiving burden has been extensively studied among familial caregivers, little is known about how FDWs engage with emotional support technologies. We present an exploratory qualitative study of how FDWs in Singapore interact with a Large Language Model (LLM)-driven chatbot as an everyday, non-clinical form of emotional support. Through interviews and guided chatbot interactions, we conducted an inductive thematic analysis of participants' experiences. We identify three design-relevant themes: chatbots were experienced as psychologically safe and emotionally validating; they supported linguistic accessibility by accommodating imperfect and fragmented language; and they were appropriated as multifunctional resources for reassurance, guidance, and companionship. We discuss implications for designing LLM-driven emotional support tools that foreground psychological safety, accessibility, and flexible appropriation.
HCJun 8, 2025
Sword and Shield: Uses and Strategies of LLMs in Navigating DisinformationGionnieve Lim, Bryan Chen Zhengyu Tan, Kellie Yu Hui Sim et al.
The emergence of Large Language Models (LLMs) presents a dual challenge in the fight against disinformation. These powerful tools, capable of generating human-like text at scale, can be weaponised to produce sophisticated and persuasive disinformation, yet they also hold promise for enhancing detection and mitigation strategies. This paper investigates the complex dynamics between LLMs and disinformation through a communication game that simulates online forums, inspired by the game Werewolf, with 25 participants. We analyse how Disinformers, Moderators, and Users leverage LLMs to advance their goals, revealing both the potential for misuse and combating disinformation. Our findings highlight the varying uses of LLMs depending on the participants' roles and strategies, underscoring the importance of understanding their effectiveness in this context. We conclude by discussing implications for future LLM development and online platform design, advocating for a balanced approach that empowers users and fosters trust while mitigating the risks of LLM-assisted disinformation.
HCMar 7
More Than 1v1: Human-AI Alignment in Early Developmental Communities with Multimodal LLMsWeiyan Shi, Kenny Tsu Wei Choo
In early developmental contexts, particularly in parent-child interaction analysis, alignment involves families and professionals such as speech-language pathologists (SLPs) who interpret children's everyday interactions from different roles. When multimodal large language models (MLLMs) are introduced to support this process, alignment becomes a question of how authority, responsibility, and emotional risk are distributed across stakeholders. Through a three-part study with five families and three SLPs, we trace how MLLM-generated outputs move from expert-facing analysis to parent-facing feedback. We propose layered community alignment: grounding representations in expert-aligned structures, mediating translation through professional guardrails, and enabling family-level adaptation within those boundaries. We argue that alignment in developmental settings should be treated as a community-governed process rather than an individual optimisation problem.
CVOct 13, 2025
BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language ModelsBryan Chen Zhengyu Tan, Zheng Weihua, Zhengyuan Liu et al.
As vision-language models (VLMs) are deployed globally, their ability to understand culturally situated knowledge becomes essential. Yet, existing evaluations largely assess static recall or isolated visual grounding, leaving unanswered whether VLMs possess robust and transferable cultural understanding. We introduce BLEnD-Vis, a multimodal, multicultural benchmark designed to evaluate the robustness of everyday cultural knowledge in VLMs across linguistic rephrasings and visual modalities. Building on the BLEnD dataset, BLEnD-Vis constructs 313 culturally grounded question templates spanning 16 regions and generates three aligned multiple-choice formats: (i) a text-only baseline querying from Region $\to$ Entity, (ii) an inverted text-only variant (Entity $\to$ Region), and (iii) a VQA-style version of (ii) with generated images. The resulting benchmark comprises 4,916 images and over 21,000 multiple-choice question (MCQ) instances, validated through human annotation. BLEnD-Vis reveals significant fragility in current VLM cultural knowledge; models exhibit performance drops under linguistic rephrasing and, whilst visual cues often aid performance, low cross-modal consistency highlights challenges in robustly integrating textual and visual understanding, particularly for lower-resource regions. BLEnD-Vis thus provides a crucial testbed for systematically analysing cultural robustness and multimodal grounding, exposing limitations and guiding the development of more culturally competent VLMs.
CVSep 12, 2025
Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online ComicsYuriel Ryan, Rui Yang Tan, Kenny Tsu Wei Choo et al.
Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models' integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.
HCJun 11, 2025
"Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in LLM-Supported InteractionsKellie Yu Hui Sim, Roy Ka-Wei Lee, Kenny Tsu Wei Choo
Mental health is a growing global concern, prompting interest in AI-driven solutions to expand access to psychosocial support. Peer support, grounded in lived experience, offers a valuable complement to professional care. However, variability in training, effectiveness, and definitions raises concerns about quality, consistency, and safety. Large Language Models (LLMs) present new opportunities to enhance peer support interactions, particularly in real-time, text-based interactions. We present and evaluate an AI-supported system with an LLM-simulated distressed client, context-sensitive LLM-generated suggestions, and real-time emotion visualisations. 2 mixed-methods studies with 12 peer supporters and 5 mental health professionals (i.e., experts) examined the system's effectiveness and implications for practice. Both groups recognised its potential to enhance training and improve interaction quality. However, we found a key tension emerged: while peer supporters engaged meaningfully, experts consistently flagged critical issues in peer supporter responses, such as missed distress cues and premature advice-giving. This misalignment highlights potential limitations in current peer support training, especially in emotionally charged contexts where safety and fidelity to best practices are essential. Our findings underscore the need for standardised, psychologically grounded training, especially as peer support scales globally. They also demonstrate how LLM-supported systems can scaffold this development--if designed with care and guided by expert oversight. This work contributes to emerging conversations on responsible AI integration in mental health and the evolving role of LLMs in augmenting peer-delivered care.
HCJun 11, 2025
"I Said Things I Needed to Hear Myself": Peer Support as an Emotional, Organisational, and Sociotechnical Practice in SingaporeKellie Yu Hui Sim, Kenny Tsu Wei Choo
Peer support plays a vital role in expanding access to mental health care by providing empathetic, community-based support outside formal clinical systems. As digital platforms increasingly mediate such support, the design and impact of these technologies remain under-examined, particularly in Asian contexts. This paper presents findings from an interview study with 20 peer supporters in Singapore, who operate across diverse online, offline, and hybrid environments. Through a thematic analysis, we unpack how participants start, conduct, and sustain peer support, highlighting their motivations, emotional labour, and the sociocultural dimensions shaping their practices. Building on this grounded understanding, we surface design directions for culturally responsive digital tools that scaffold rather than supplant relational care. Drawing insights from qualitative accounts, we offer a situated perspective on how AI might responsibly augment peer support. This research contributes to human-centred computing by articulating the lived realities of peer supporters and proposing design implications for trustworthy and context-sensitive AI in mental health.
CVMar 17, 2025
Analyzing Swimming Performance Using Drone Captured Aerial VideosThu Tran, Kenny Tsu Wei Choo, Shaohui Foong et al.
Monitoring swimmer performance is crucial for improving training and enhancing athletic techniques. Traditional methods for tracking swimmers, such as above-water and underwater cameras, face limitations due to the need for multiple cameras and obstructions from water splashes. This paper presents a novel approach for tracking swimmers using a moving UAV. The proposed system employs a UAV equipped with a high-resolution camera to capture aerial footage of the swimmers. The footage is then processed using computer vision algorithms to extract the swimmers' positions and movements. This approach offers several advantages, including single camera use and comprehensive coverage. The system's accuracy is evaluated with both training and in competition videos. The results demonstrate the system's ability to accurately track swimmers' movements, limb angles, stroke duration and velocity with the maximum error of 0.3 seconds and 0.35~m/s for stroke duration and velocity, respectively.
CLJun 18, 2024
ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking PerturbationsYunze Xiao, Yujia Hu, Kenny Tsu Wei Choo et al.
Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.
HCJun 17, 2024
Towards Understanding Emotions for Engaged Mental Health ConversationsKellie Yu Hui Sim, Kohleen Tijing Fortuno, Kenny Tsu Wei Choo
Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health--particularly for crisis care--we are developing a system to perform passive emotion-sensing using a combination of keystroke dynamics and sentiment analysis. Our early studies of this system posit that the analysis of short text messages and keyboard typing patterns can provide emotion information that may be used to support both clients and responders. We use our preliminary findings to discuss the way forward for applying AI to support mental health providers in providing better care.