Xiang Zhi Tan

HC
h-index17
4papers
5citations
Novelty45%
AI Score42

4 Papers

78.5HCJun 2
From 'What' to 'How' and 'Why': Sharing LLM-Generated Retrospective Summaries of Older Adults' Passive Tracking Data with Remote Family Members

Jiachen Li, Reina Szeyi Chan, Akshat Choube et al. · eth-zurich

With the growing prevalence of modern ubiquitous computing technologies, multi-modal tracking systems hold promise for providing timely awareness and reassurance to stakeholders such as remote family members (RFMs) of older adults, who play a central role in care coordination. However, combining heterogeneous data streams into high-level, meaningful content - such as retrospective summaries - remains challenging. While recent work has demonstrated the promise of large language models (LLMs) for interpreting multi-modal tracking data, less attention has been given to generating narrative accounts for stakeholders like RFMs, who possess rich personal knowledge of older adults and strong emotional responsibility, yet have limited visibility into their daily lives and limited capacity for caregiving. In this work, we explore how LLMs can be used to generate retrospective summaries from multi-modal tracking data for RFMs of older adults. We leveraged and customized an existing system, Vital Insight, to generate initial summaries on different dates and data availability scenarios as technology probes, and conducted interviews with 11 RFMs to gather feedback. Based on these insights, we redesigned the system into a multi-layer, multi-agent, insight-driven summary approach that builds from objective statistics and descriptions to enriched, context-aware narratives. We then compared the redesigned summaries with the initial versions through a survey with the same 11 RFMs and found significant improvements in satisfaction, perceived helpfulness, trust, and willingness to receive the summaries. We conclude by presenting design implications for AI-generated summaries for RFMs and broader contexts, emphasizing the need to support RFMs' sensemaking shift from simply presenting ''What'' data were collected, to explaining ''How'' is my loved one doing and ''Why''.

40.5HCMay 21
Remind Me To Check The Stove Before I Leave The House: Authoring Personalized Context-Aware Smart Home Reminders Using Everyday Language

Reina Szeyi Chan, Sujendra Jayant Gharat, Maya Lampi et al.

Reminder systems commonly rely on fixed schedules, location triggers, or simple rules, limiting their ability to leverage the rich sensing capabilities of modern smart homes. A key challenge lies in enabling users to specify context-aware reminders without requiring complex configurations. We present a system pipeline that supports reminder authoring through natural language and conversational interaction. The pipeline translates user requests into structured representations and executable logic, incorporating time-based, activity-based, sensor-based, and state-based conditions. We conducted two studies to examine how users express reminder intent and how conversational support influences the authoring process. In Study 1 (N=40), we analyzed 233 user-authored reminders and identified challenges in expressing reminders with diverse and complex logic. Based on these findings, we refined the system and evaluated it in Study 2 (N=10), demonstrating improved handling of time-based, activity-based, sensor-based, and state-based conditions. Our results highlight the diversity and ambiguity of user expressions and show that conversational guidance can help structure these expressions into flexible, context-aware reminders.

86.8HCMay 4
TRACE: Temporal Reasoning over Context and Evidence for Activity Recognition in Smart Homes

Yingtian Shi, Abivishaq Balasubramanian, Jessica Herring et al.

Human activity recognition (HAR) in smart homes remains challenging because many daily activities exhibit similar local sensor patterns, while minimally intrusive sensing provides sparse and ambiguous observations. As a result, methods based on short temporal or event windows often fail to capture the broader temporal and behavioral context needed for reliable activity understanding. We present TRACE (Temporal Reasoning over Context and Evidence), a contextual activity recognition framework for smart homes that integrates multi-source sensor evidence with user-specific contextual priors to improve activity interpretation. Rather than treating recognition as a local classification problem, TRACE leverages contextual reasoning to resolve ambiguities, reduce fragmented predictions, and infer more semantically specific activities. We evaluate TRACE on public benchmarks and in a deployment study conducted in our smart-home environment. Results show that TRACE improves recognition accuracy for semantically complex activities, produces more temporally coherent predictions that better align with user-specific routines, and maintains robust performance under cross-domain transfer and missing-modality conditions. These findings demonstrate the value of contextual reasoning for advancing smart-home HAR.

ROOct 18, 2024
Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents

Sabit Hassan, Hye-Young Chung, Xiang Zhi Tan et al.

When assisting people in daily tasks, robots need to accurately interpret visual cues and respond effectively in diverse safety-critical situations, such as sharp objects on the floor. In this context, we present M-CoDAL, a multimodal-dialogue system specifically designed for embodied agents to better understand and communicate in safety-critical situations. The system leverages discourse coherence relations to enhance its contextual understanding and communication abilities. To train this system, we introduce a novel clustering-based active learning mechanism that utilizes an external Large Language Model (LLM) to identify informative instances. Our approach is evaluated using a newly created multimodal dataset comprising 1K safety violations extracted from 2K Reddit images. These violations are annotated using a Large Multimodal Model (LMM) and verified by human annotators. Results with this dataset demonstrate that our approach improves resolution of safety situations, user sentiment, as well as safety of the conversation. Next, we deploy our dialogue system on a Hello Robot Stretch robot and conduct a within-subject user study with real-world participants. In the study, participants role-play two safety scenarios with different levels of severity with the robot and receive interventions from our model and a baseline system powered by OpenAI's ChatGPT. The study results corroborate and extend the findings from the automated evaluation, showing that our proposed system is more persuasive in a real-world embodied agent setting.