Nadir Weibel

HC
h-index6
13papers
98citations
Novelty29%
AI Score43

13 Papers

SDFeb 26
Same Words, Different Judgments: Modality Effects on Preference Alignment

Aaron Broukhim, Nadir Weibel, Eshin Jolly

Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences, but its application to speech remains underexplored. We present a controlled cross-modal study of human and synthetic preference annotations, comparing text and audio evaluations of identical semantic content across 100 prompts. Audio preferences prove as reliable as text, with inter-rater agreement reaching good levels (ICC(2,k) $\approx$ .80) at $\sim$9 raters -- the first ICC-based reliability characterization in the preference annotation literature for either modality. However, modality reshapes how people judge: audio raters exhibit narrower decision thresholds, reduced length bias, and more user-oriented evaluation criteria, with near-chance cross-modality agreement. Synthetic ratings further align with human judgments and predict inter-rater agreement, supporting their use both for triaging ambiguous pairs and as full replacements for human annotations.

HCSep 13, 2024
Predicting Trust In Autonomous Vehicles: Modeling Young Adult Psychosocial Traits, Risk-Benefit Attitudes, And Driving Factors With Machine Learning

Robert Kaufman, Emi Lee, Manas Satish Bedmutha et al.

Low trust remains a significant barrier to Autonomous Vehicle (AV) adoption. To design trustworthy AVs, we need to better understand the individual traits, attitudes, and experiences that impact people's trust judgements. We use machine learning to understand the most important factors that contribute to young adult trust based on a comprehensive set of personal factors gathered via survey (n = 1457). Factors ranged from psychosocial and cognitive attributes to driving style, experiences, and perceived AV risks and benefits. Using the explainable AI technique SHAP, we found that perceptions of AV risks and benefits, attitudes toward feasibility and usability, institutional trust, prior experience, and a person's mental model are the most important predictors. Surprisingly, psychosocial and many technology- and driving-specific factors were not strong predictors. Results highlight the importance of individual differences for designing trustworthy AVs for diverse groups and lead to key implications for future design and research.

HCSep 9, 2024
What Did My Car Say? Impact of Autonomous Vehicle Explanation Errors and Driving Context On Comfort, Reliance, Satisfaction, and Driving Confidence

Robert Kaufman, Aaron Broukhim, David Kirsh et al.

Explanations for autonomous vehicle (AV) decisions may build trust, however, explanations can contain errors. In a simulated driving study (n = 232), we tested how AV explanation errors, driving context characteristics (perceived harm and driving difficulty), and personal traits (prior trust and expertise) affected a passenger's comfort in relying on an AV, preference for control, confidence in the AV's ability, and explanation satisfaction. Errors negatively affected all outcomes. Surprisingly, despite identical driving, explanation errors reduced ratings of the AV's driving ability. Severity and potential harm amplified the negative impact of errors. Contextual harm and driving difficulty directly impacted outcome ratings and influenced the relationship between errors and outcomes. Prior trust and expertise were positively associated with outcome ratings. Results emphasize the need for accurate, contextually adaptive, and personalized AV explanations to foster trust, reliance, satisfaction, and confidence. We conclude with design, research, and deployment recommendations for trustworthy AV explanation systems.

CYJul 1, 2024
Toward Automated Detection of Biased Social Signals from the Content of Clinical Conversations

Feng Chen, Manas Satish Bedmutha, Ray-Yuan Chung et al.

Implicit bias can impede patient-provider interactions and lead to inequities in care. Raising awareness is key to reducing such bias, but its manifestations in the social dynamics of patient-provider communication are difficult to detect. In this study, we used automated speech recognition (ASR) and natural language processing (NLP) to identify social signals in patient-provider interactions. We built an automated pipeline to predict social signals from audio recordings of 782 primary care visits that achieved 90.1% average accuracy across codes, and exhibited fairness in its predictions for white and non-white patients. Applying this pipeline, we identified statistically significant differences in provider communication behavior toward white versus non-white patients. In particular, providers expressed more patient-centered behaviors towards white patients including more warmth, engagement, and attentiveness. Our study underscores the potential of automated tools in identifying subtle communication signals that may be linked with bias and impact healthcare quality and equity.

CLMar 11
Depression Detection at the Point of Care: Automated Analysis of Linguistic Signals from Routine Primary Care Encounters

Feng Chen, Manas Bedmutha, Janice Sabin et al.

Depression is underdiagnosed in primary care, yet timely identification remains critical. Recorded clinical encounters, increasingly common with digital scribing technologies, present an opportunity to detect depression from naturalistic dialogue. We investigated automated depression detection from 1,108 audio-recorded primary care encounters in the Establishing Focus study, with depression defined by PHQ-9 (n=253 depressed, n=855 non-depressed). We compared three supervised approaches, Sentence-BERT + Logistic Regression (LR), LIWC+LR and ModernBERT, against a zero-shot GPT-OSS. GPT-OSS achieved the strongest performance (AUPRC=0.510, AUROC=0.774), with LIWC+LR competitive among supervised models (AUPRC=0.500, AUROC=0.742). Combined dyadic transcripts outperformed single-speaker configurations, with providers linguistically mirroring patients in depression encounters, an additive signal not captured by either speaker alone. Meaningful detection is achievable from the first 128 patient tokens (AUPRC=0.356, AUROC=0.675), supporting in-the-moment clinical decision support. These findings argue for passively collected clinical audio as a low-burden complement to existing screening workflows.

HCApr 17, 2024
Developing Situational Awareness for Joint Action with Autonomous Vehicles

Robert Kaufman, David Kirsh, Nadir Weibel

Unanswered questions about how human-AV interaction designers can support rider's informational needs hinders Autonomous Vehicles (AV) adoption. To achieve joint human-AV action goals - such as safe transportation, trust, or learning from an AV - sufficient situational awareness must be held by the human, AV, and human-AV system collectively. We present a systems-level framework that integrates cognitive theories of joint action and situational awareness as a means to tailor communications that meet the criteria necessary for goal success. This framework is based on four components of the shared situation: AV traits, action goals, subject-specific traits and states, and the situated driving context. AV communications should be tailored to these factors and be sensitive when they change. This framework can be useful for understanding individual, shared, and distributed human-AV situational awareness and designing for future AV communications that meet the informational needs and goals of diverse groups and in diverse driving contexts.

CLMay 7, 2025
Can Language Models Understand Social Behavior in Clinical Conversations?

Manas Satish Bedmutha, Feng Chen, Andrea Hartzler et al.

Effective communication between providers and their patients influences health and care outcomes. The effectiveness of such conversations has been linked not only to the exchange of clinical information, but also to a range of interpersonal behaviors; commonly referred to as social signals, which are often conveyed through non-verbal cues and shape the quality of the patient-provider relationship. Recent advances in large language models (LLMs) have demonstrated an increasing ability to infer emotional and social behaviors even when analyzing only textual information. As automation increases also in clinical settings, such as for transcription of patient-provider conversations, there is growing potential for LLMs to automatically analyze and extract social behaviors from these interactions. To explore the foundational capabilities of LLMs in tracking social signals in clinical dialogue, we designed task-specific prompts and evaluated model performance across multiple architectures and prompting styles using a highly imbalanced, annotated dataset spanning 20 distinct social signals such as provider dominance, patient warmth, etc. We present the first system capable of tracking all these 20 coded signals, and uncover patterns in LLM behavior. Further analysis of model configurations and clinical context provides insights for enhancing LLM performance on social signal processing tasks in healthcare settings.

HCNov 18, 2025
SweeperBot: Making 3D Browsing Accessible through View Analysis and Visual Question Answering

Chen Chen, Cuong Nguyen, Alexa Siu et al.

Accessing 3D models remains challenging for Screen Reader (SR) users. While some existing 3D viewers allow creators to provide alternative text, they often lack sufficient detail about the 3D models. Grounded on a formative study, this paper introduces SweeperBot, a system that enables SR users to leverage visual question answering to explore and compare 3D models. SweeperBot answers SR users' visual questions by combining an optimal view selection technique with the strength of generative- and recognition-based foundation models. An expert review with 10 Blind and Low-Vision (BLV) users with SR experience demonstrated the feasibility of using SweeperBot to assist BLV users in exploring and comparing 3D models. The quality of the descriptions generated by SweeperBot was validated by a second survey study with 30 sighted participants.

SDNov 17, 2025
Preference-Based Learning in Audio Applications: A Systematic Analysis

Aaron Broukhim, Yiran Shen, Prithviraj Ammanabrolu et al.

Despite the parallel challenges that audio and text domains face in evaluating generative model outputs, preference learning remains remarkably underexplored in audio applications. Through a PRISMA-guided systematic review of approximately 500 papers, we find that only 30 (6%) apply preference learning to audio tasks. Our analysis reveals a field in transition: pre-2021 works focused on emotion recognition using traditional ranking methods (rankSVM), while post-2021 studies have pivoted toward generation tasks employing modern RLHF frameworks. We identify three critical patterns: (1) the emergence of multi-dimensional evaluation strategies combining synthetic, automated, and human preferences; (2) inconsistent alignment between traditional metrics (WER, PESQ) and human judgments across different contexts; and (3) convergence on multi-stage training pipelines that combine reward signals. Our findings suggest that while preference learning shows promise for audio, particularly in capturing subjective qualities like naturalness and musicality, the field requires standardized benchmarks, higher-quality datasets, and systematic investigation of how temporal factors unique to audio impact preference learning frameworks.

HCNov 5, 2021
Understanding Barriers and Design Opportunities to Improve Healthcare and QOL for Older Adults through Voice Assistants

Chen Chen, Janet G. Johnson, Kemeberly Charles et al.

Voice based Intelligent Virtual Assistants (IVAs) promise to improve healthcare management and Quality of Life (QOL) by introducing the paradigm of hands free and eye free interactions. However, there has been little understanding regarding the challenges for designing such systems for older adults, especially when it comes to healthcare related tasks. To tackle this, we consider the processes of care delivery and QOL enhancements for older adults as a collaborative task between patients and providers. By interviewing 16 older adults living independently or semi independently and 5 providers, we identified 12 barriers that older adults might encounter during daily routine and while managing health. We ultimately highlighted key design challenges and opportunities that might be introduced when integrating voice based IVAs into the life of older adults. Our work will benefit practitioners who study and attempt to create full fledged IVA powered smart devices to deliver better care and support an increased QOL for aging populations.

HCFeb 13, 2020
Interactive Multi-User 3D Visual Analytics in Augmented Reality

Wanze Xie, Yining Liang, Janet Johnson et al.

This publication reports on a research project in which we set out to explore the advantages and disadvantages augmented reality (AR) technology has for visual data analytics. We developed a prototype of an AR data analytics application, which provides users with an interactive 3D interface, hand gesture-based controls and multi-user support for a shared experience, enabling multiple people to collaboratively visualize, analyze and manipulate data with high dimensional features in 3D space. Our software prototype, called DataCube, runs on the Microsoft HoloLens - one of the first true stand-alone AR headsets, through which users can see computer-generated images overlaid onto real-world objects in the user's physical environment. Using hand gestures, the users can select menu options, control the 3D data visualization with various filtering and visualization functions, and freely arrange the various menus and virtual displays in their environment. The shared multi-user experience allows all participating users to see and interact with the virtual environment, changes one user makes will become visible to the other users instantly. As users engage together they are not restricted from observing the physical world simultaneously and therefore they can also see non-verbal cues such as gesturing or facial reactions of other users in the physical environment. The main objective of this research project was to find out if AR interfaces and collaborative analysis can provide an effective solution for data analysis tasks, and our experience with our prototype system confirms this.

CYDec 19, 2016
Managing Commercial HVAC Systems: What do Building Operators Really Need?

Bharathan Balaji, Nadir Weibel, Yuvraj Agarwal

Buildings form an essential part of modern life; people spend a significant amount of their time in them, and they consume large amounts of energy. A variety of systems provide services such as lighting, air conditioning and security which are managed using Building Management Systems (BMS) by building operators. To better understand the capability of current BMS and characterize common practices of building operators, we investigated their use across five institutions in the US. We interviewed ten operators and discovered that BMS do not address a number of key concerns for the management of buildings. Our analysis is rooted in the everyday work of building operators and highlights a number of design suggestions to help improve the user experience and management of BMS, ultimately leading to improvements in productivity, as well as buildings comfort and energy efficiency.

HCJan 26, 2016
Genie: A Longitudinal Study Comparing Physical and Software-augmented Thermostats in Office Buildings

Bharathan Balaji, Jason Koh, Nadir Weibel et al.

Thermostats are primary interfaces for occupants of office buildings to express their comfort preferences. However, standard thermostats are often ineffective due to inaccessibility, lack of information, or limited responsiveness, leading to occupant discomfort. Software thermostats based on web or smartphone applications provide alternative interfaces to occupants with minimal deployment cost. However, their usage and effectiveness have not been studied extensively in real settings. In this paper we present Genie, a novel software-augmented thermostat that we deployed and studied at our university over a period of 21 months. Our data shows that providing wider thermal control to users does not lead to system abuse and that the effect on energy consumption is minimal while improving comfort and energy awareness. We believe that increased introduction of software thermostats in office buildings will have important effects on comfort and energy consumption and we provide key design recommendations for their implementation and deployment.