Patrick Carrington

HC
h-index21
10papers
88citations
Novelty42%
AI Score50

10 Papers

GRSep 13, 2024Code
WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users

Yunzhi Li, Vimal Mollyn, Kuang Yuan et al.

Despite researchers having extensively studied various ways to track body pose on-the-go, most prior work does not take into account wheelchair users, leading to poor tracking performance. Wheelchair users could greatly benefit from this pose information to prevent injuries, monitor their health, identify environmental accessibility barriers, and interact with gaming and VR experiences. In this work, we present WheelPoser, a real-time pose estimation system specifically designed for wheelchair users. Our system uses only four strategically placed IMUs on the user's body and wheelchair, making it far more practical than prior systems using cameras and dense IMU arrays. WheelPoser is able to track a wheelchair user's pose with a mean joint angle error of 14.30 degrees and a mean joint position error of 6.74 cm, more than three times better than similar systems using sparse IMUs. To train our system, we collect a novel WheelPoser-IMU dataset, consisting of 167 minutes of paired IMU sensor and motion capture data of people in wheelchairs, including wheelchair-specific motions such as propulsion and pressure relief. Finally, we explore the potential application space enabled by our system and discuss future opportunities. Open-source code, models, and dataset can be found here: https://github.com/axle-lab/WheelPoser.

CVMar 12Code
OSCBench: Benchmarking Object State Change in Text-to-Video Generation

Xianjing Han, Bin Zhu, Shiqi Hu et al.

Text-to-video (T2V) generation models have made rapid progress in producing visually high-quality and temporally coherent videos. However, existing benchmarks primarily focus on perceptual quality, text-video alignment, or physical plausibility, leaving a critical aspect of action understanding largely unexplored: object state change (OSC) explicitly specified in the text prompt. OSC refers to the transformation of an object's state induced by an action, such as peeling a potato or slicing a lemon. In this paper, we introduce OSCBench, a benchmark specifically designed to assess OSC performance in T2V models. OSCBench is constructed from instructional cooking data and systematically organizes action-object interactions into regular, novel, and compositional scenarios to probe both in-distribution performance and generalization. We evaluate six representative open-source and proprietary T2V models using both human user study and multimodal large language model (MLLM)-based automatic evaluation. Our results show that, despite strong performance on semantic and scene alignment, current T2V models consistently struggle with accurate and temporally consistent object state changes, especially in novel and compositional settings. These findings position OSC as a key bottleneck in text-to-video generation and establish OSCBench as a diagnostic benchmark for advancing state-aware video generation models.

HCMar 11
"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

Atieh Taheri, Hamza El Alaoui, Patrick Carrington et al.

Ableist microaggressions remain pervasive in everyday interactions, yet interventions to help people recognize them are limited. We present an experiment testing how AI-mediated dialogue influences recognition of ableism. 160 participants completed a pre-test, intervention, and a post-test across four conditions: AI nudges toward bias (Bias-Directed), inclusion (Neutral-Directed), unguided dialogue (Self-Directed), and a text-only non-dialogue (Reading). Participants rated scenarios on standardness of social experience and emotional impact; those in dialogue-based conditions also provided qualitative reflections. Quantitative results showed dialogue-based conditions produced stronger recognition than Reading, though trajectories diverged: biased nudges improved differentiation of bias from neutrality but increased overall negativity. Inclusive or no nudges remained more balanced, while Reading participants showed weaker gains and even declines. Qualitative findings revealed biased nudges were often rejected, while inclusive nudges were adopted as scaffolding. We contribute a validated vignette corpus, an AI-mediated intervention platform, and design implications highlighting trade-offs conversational systems face when integrating bias-related nudges.

HCMar 7, 2025
OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking

Franklin Mingzhe Li, Kaitlyn Ng, Bin Zhu et al.

Following recipes while cooking is an important but difficult task for visually impaired individuals. We developed OSCAR (Object Status Context Awareness for Recipes), a novel approach that provides recipe progress tracking and context-aware feedback on the completion of cooking tasks through tracking object statuses. OSCAR leverages both Large-Language Models (LLMs) and Vision-Language Models (VLMs) to manipulate recipe steps, extract object status information, align visual frames with object status, and provide cooking progress tracking log. We evaluated OSCAR's recipe following functionality using 173 YouTube cooking videos and 12 real-world non-visual cooking videos to demonstrate OSCAR's capability to track cooking steps and provide contextual guidance. Our results highlight the effectiveness of using object status to improve performance compared to baseline by over 20% across different VLMs, and we present factors that impact prediction performance. Furthermore, we contribute a dataset of real-world non-visual cooking videos with step annotations as an evaluation benchmark.

HCJul 5, 2025
More than One Step at a Time: Designing Procedural Feedback for Non-visual Makeup Routines

Franklin Mingzhe Li, Akihiko Oharazawa, Chloe Qingyu Zhu et al.

Makeup plays a vital role in self-expression, identity, and confidence - yet remains an underexplored domain for assistive technology, especially for people with vision impairments. While existing tools support isolated tasks such as color identification or product labeling, they rarely address the procedural complexity of makeup routines: coordinating step sequences, managing product placement, and assessing the final look with accessible feedback. To understand the real-world process, we conducted a contextual inquiry with 15 visually impaired makeup users, capturing real-time makeup application behaviors and their step-by-step information needs and assessment approaches. Our findings reveal embodied, tactile-first strategies; persistent challenges in blending, symmetry, and assessment; and a desire for honest, real-time, goal-aligned feedback. We also interviewed five professional makeup artists, who reviewed participant makeup videos and provided expert responses to participant-raised questions and assessment practices. We contribute a taxonomy of feedback needs in non-visual makeup, and outline design implications for future assistive systems - emphasizing hands-free, conversational interaction and context-aware, procedural support for expressive and independent beauty practices.

AIJul 4, 2025
Exploring Object Status Recognition for Recipe Progress Tracking in Non-Visual Cooking

Franklin Mingzhe Li, Kaitlyn Ng, Bin Zhu et al.

Cooking plays a vital role in everyday independence and well-being, yet remains challenging for people with vision impairments due to limited support for tracking progress and receiving contextual feedback. Object status - the condition or transformation of ingredients and tools - offers a promising but underexplored foundation for context-aware cooking support. In this paper, we present OSCAR (Object Status Context Awareness for Recipes), a technical pipeline that explores the use of object status recognition to enable recipe progress tracking in non-visual cooking. OSCAR integrates recipe parsing, object status extraction, visual alignment with cooking steps, and time-causal modeling to support real-time step tracking. We evaluate OSCAR on 173 instructional videos and a real-world dataset of 12 non-visual cooking sessions recorded by BLV individuals in their homes. Our results show that object status consistently improves step prediction accuracy across vision-language models, and reveal key factors that impact performance in real-world conditions, such as implicit tasks, camera placement, and lighting. We contribute the pipeline of context-aware recipe progress tracking, an annotated real-world non-visual cooking dataset, and design insights to guide future context-aware assistive cooking systems.

HCMar 8
From Autonomy to Sovereignty - A New Telos for Socially Assistive Technology

JiWoong Jang, Patrick Carrington, Andrew Begel

Social accessibility research faces a persistent tension: assistive technologies (AT) predominantly pursue independence, yet disabled people's experiences reveal rich preferences for interdependence. Our analysis of 90 papers from 2011-2025 uncovered that this stems from a deeper issue - which crystallized through dialogue with three bodies of theories: (1) self-determination theory (SDT), (2) symbolic interactionism, and (3) posthumanist perspectives and crip technoscience. SDT illuminates individual needs; symbolic interactionism addresses construction of social meaning and stigma; Posthumanist and crip technoscience together challenges normalcy, governance, and the human-machine boundary. Through their tensions, we identify relational sovereignty as an alternative telos - or goal - to autonomy. While our corpus equates autonomy with independence, sovereignty centers the power to choose between independence and interdependence. To operationalize this shift - from "Can they do it?" to "Do they get to decide?" - we introduce the Relational Sovereignty Matrix and four design interventions: (1) a sovereignty-centered reframing of SDT, (2) generative questions for justice-oriented reflection, (3) the idea of building through sovereign technical primitives, and (4) explicit consideration of power in AT design.

HCMar 8
The Three Praxes Framework - A Thematic Review and Map of Social Accessibility Research

JiWoong Jang, Patrick Carrington, Andrew Begel

Research in social accessibility aims to improve the lives of disabled people across diverse abilities and experiences by assisting with communication, relationships, and ecosystems of access. We seek to understand this intersectional body of work through analyzing social accessibility research from 2011 to 2025. Through constructivist grounded theory analysis of 90 papers (curated from 605), we develop the Three Praxes Framework: three sites of practice Artifact (constructive), Ecosystem (relational), and Epistemology (theoretical) - two cross-cutting stances toward change (Temporal Orientation and Stakeholder Focus) - and one reflexive cycle modeling how insights can flow between praxes. Our analysis reveals these praxes operate largely in isolation, risking that insights remain academic exercises while assistive technologies reinforce existing barriers. We call on the field to realize a cycle where disabled people's lived experiences shape material realities, material practice generates theoretical knowledge, and both transform ecosystems of access.

HCJan 26, 2022
An Exploration of Captioning Practices and Challenges of Individual Content Creators on YouTube for People with Hearing Impairments

Franklin Mingzhe Li, Cheng Lu, Zhicong Lu et al.

Deaf and Hard-of-Hearing (DHH) audiences have long complained about caption qualities for many online videos created by individual content creators on video-sharing platforms (e.g., YouTube). However, there lack explorations of practices, challenges, and perceptions of online video captions from the perspectives of both individual content creators and DHH audiences. In this work, we first explore DHH audiences' feedback on and reactions to YouTube video captions through interviews with 13 DHH individuals, and uncover DHH audiences' experiences, challenges, and perceptions on watching videos created by individual content creators (e.g., manually added caption tags could create additional confidence and trust in caption qualities for DHH audiences). We then discover individual content creators' practices, challenges, and perceptions on captioning their videos (e.g., back-captioning problems) by conducting a YouTube video analysis with 189 captioning-related YouTube videos, followed by a survey with 62 individual content creators. Overall, our findings provide an in-depth understanding of captions generated by individual content creators and bridge the knowledge gap mutually between content creators and DHH audiences on captions.

HCJul 12, 2021
Non-Visual Cooking: Exploring Practices and Challenges of Meal Preparation by People with Visual Impairments

Franklin Mingzhe Li, Jamie Dorst, Peter Cederberg et al.

The reliance on vision for tasks related to cooking and eating healthy can present barriers to cooking for oneself and achieving proper nutrition. There has been little research exploring cooking practices and challenges faced by people with visual impairments. We present a content analysis of 122 YouTube videos to highlight the cooking practices of visually impaired people, and we describe detailed practices for 12 different cooking activities (e.g., cutting and chopping, measuring, testing food for doneness). Based on the cooking practices, we also conducted semi-structured interviews with 12 visually impaired people who have cooking experience and show existing challenges, concerns, and risks in cooking (e.g., tracking the status of tasks in progress, verifying whether things are peeled or cleaned thoroughly). We further discuss opportunities to support the current practices and improve the independence of people with visual impairments in cooking (e.g., zero-touch interactions for cooking). Overall, our findings provide guidance for future research exploring various assistive technologies to help people cook without relying on vision.