HCMar 20
Promoting Critical Thinking With Domain-Specific Generative AI ProvocationsThomas Şerban von Davier, Hao-Ping Lee, Jodi Forlizzi et al.
The evidence on the effects of generative AI (GenAI) on critical thinking is mixed, with studies suggesting both potential harms and benefits depending on its implementation. Some argue that AI-driven provocations, such as questions asking for human clarification and justification, are beneficial for eliciting critical thinking. Drawing on our experience designing and evaluating two GenAI-powered tools for knowledge work, ArtBot in the domain of fine art interpretation and Privy in the domain of AI privacy, we reflect on how design decisions shape the form and effectiveness of such provocations. Our observations and user feedback suggest that domain-specific provocations, implemented through productive friction and interactions that depend on user contribution, can meaningfully support critical thinking. We present participant experiences with both prototypes and discuss how supporting critical thinking may require moving beyond static provocations toward approaches that adapt to user preferences and levels of expertise.
HCApr 15, 2025
Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered PerspectiveQiaosi Wang, Xuhui Zhou, Maarten Sap et al. · allen-ai, cmu
The last couple of years have witnessed emerging research that appropriates Theory-of-Mind (ToM) tasks designed for humans to benchmark LLM's ToM capabilities as an indication of LLM's social intelligence. However, this approach has a number of limitations. Drawing on existing psychology and AI literature, we summarize the theoretical, methodological, and evaluation limitations by pointing out that certain issues are inherently present in the original ToM tasks used to evaluate human's ToM, which continues to persist and exacerbated when appropriated to benchmark LLM's ToM. Taking a human-computer interaction (HCI) perspective, these limitations prompt us to rethink the definition and criteria of ToM in ToM benchmarks in a more dynamic, interactional approach that accounts for user preferences, needs, and experiences with LLMs in such evaluations. We conclude by outlining potential opportunities and challenges towards this direction.
HCFeb 25, 2025
AI Mismatches: Identifying Potential Algorithmic Harms Before AI DevelopmentDevansh Saxena, Ji-Youn Jung, Jodi Forlizzi et al.
AI systems are often introduced with high expectations, yet many fail to deliver, resulting in unintended harm and missed opportunities for benefit. We frequently observe significant "AI Mismatches", where the system's actual performance falls short of what is needed to ensure safety and co-create value. These mismatches are particularly difficult to address once development is underway, highlighting the need for early-stage intervention. Navigating complex, multi-dimensional risk factors that contribute to AI Mismatches is a persistent challenge. To address it, we propose an AI Mismatch approach to anticipate and mitigate risks early on, focusing on the gap between realistic model performance and required task performance. Through an analysis of 774 AI cases, we extracted a set of critical factors, which informed the development of seven matrices that map the relationships between these factors and highlight high-risk areas. Through case studies, we demonstrate how our approach can help reduce risks in AI development.
HCMay 21, 2025
Exploring the Innovation Opportunities for Pre-trained ModelsMinjung Park, Jodi Forlizzi, John Zimmerman
Innovators transform the world by understanding where services are successfully meeting customers' needs and then using this knowledge to identify failsafe opportunities for innovation. Pre-trained models have changed the AI innovation landscape, making it faster and easier to create new AI products and services. Understanding where pre-trained models are successful is critical for supporting AI innovation. Unfortunately, the hype cycle surrounding pre-trained models makes it hard to know where AI can really be successful. To address this, we investigated pre-trained model applications developed by HCI researchers as a proxy for commercially successful applications. The research applications demonstrate technical capabilities, address real user needs, and avoid ethical challenges. Using an artifact analysis approach, we categorized capabilities, opportunity domains, data types, and emerging interaction design patterns, uncovering some of the opportunity space for innovation with pre-trained models.
HCDec 14, 2025
Can You Keep a Secret? Exploring AI for Care Coordination in Cognitive DeclineAlicia, Lee, Mai Lee Chang et al.
The increasing number of older adults who experience cognitive decline places a burden on informal caregivers, whose support with tasks of daily living determines whether older adults can remain in their homes. To explore how agents might help lower-SES older adults to age-in-place, we interviewed ten pairs of older adults experiencing cognitive decline and their informal caregivers. We explored how they coordinate care, manage burdens, and sustain autonomy and privacy. Older adults exercised control by delegating tasks to specific caregivers, keeping information about all the care they received from their adult children. Many abandoned some tasks of daily living, lowering their quality of life to ease caregiver burden. One effective strategy, piggybacking, uses spontaneous overlaps in errands to get more work done with less caregiver effort. This raises the questions: (i) Can agents help with piggyback coordination? (ii) Would it keep older adults in their homes longer, while not increasing caregiver burden?
HCFeb 11
Situated, Dynamic, and Subjective: Envisioning the Design of Theory-of-Mind-Enabled Everyday AI with Industry PractitionersQiaosi Wang, Jini Kim, Avanita Sharma et al.
Theory of Mind (ToM) -- the ability to infer what others are thinking (e.g., intentions) from observable cues -- is traditionally considered fundamental to human social interactions. This has sparked growing efforts in building and benchmarking AI's ToM capability, yet little is known about how such capability could translate into the design and experience of everyday user-facing AI products and services. We conducted 13 co-design sessions with 26 U.S.-based AI practitioners to envision, reflect, and distill design recommendations for ToM-enabled everyday AI products and services that are both future-looking and grounded in the realities of AI design and development practices. Analysis revealed three interrelated design recommendations: ToM-enabled AI should 1) be situated in the social context that shape users' mental states, 2) be responsive to the dynamic nature of mental states, and 3) be attuned to subjective individual differences. We surface design tensions within each recommendation that reveal a broader gap between practitioners' envisioned futures of ToM-enabled AI and the realities of current AI design and development practices. These findings point toward the need to move beyond static, inference-driven approach to ToM and toward designing ToM as a pervasive capability that supports continuous human-AI interaction loops.
HCSep 27, 2025
Privy: Envisioning and Mitigating Privacy Risks for Consumer-facing AI Product ConceptsHao-Ping Lee, Yu-Ju Yang, Matthew Bilik et al.
AI creates and exacerbates privacy risks, yet practitioners lack effective resources to identify and mitigate these risks. We present Privy, a tool that guides practitioners through structured privacy impact assessments to: (i) identify relevant risks in novel AI product concepts, and (ii) propose appropriate mitigations. Privy was shaped by a formative study with 11 practitioners, which informed two versions -- one LLM-powered, the other template-based. We evaluated these two versions of Privy through a between-subjects, controlled study with 24 separate practitioners, whose assessments were reviewed by 13 independent privacy experts. Results show that Privy helps practitioners produce privacy assessments that experts deemed high quality: practitioners identified relevant risks and proposed appropriate mitigation strategies. These effects were augmented in the LLM-powered version. Practitioners themselves rated Privy as being useful and usable, and their feedback illustrates how it helps overcome long-standing awareness, motivation, and ability barriers in privacy work.
HCOct 7, 2019
Keeping Designers in the Loop: Communicating Inherent Algorithmic Trade-offs Across Multiple ObjectivesBowen Yu, Ye Yuan, Loren Terveen et al.
Artificial intelligence algorithms have been used to enhance a wide variety of products and services, including assisting human decision making in high-stakes contexts. However, these algorithms are complex and have trade-offs, notably between prediction accuracy and fairness to population subgroups. This makes it hard for designers to understand algorithms and design products or services in a way that respects users' goals, values, and needs. We proposed a method to help designers and users explore algorithms, visualize their trade-offs, and select algorithms with trade-offs consistent with their goals and needs. We evaluated our method on the problem of predicting criminal defendants' likelihood to re-offend through (i) a large-scale Amazon Mechanical Turk experiment, and (ii) in-depth interviews with domain experts. Our evaluations show that our method can help designers and users of these systems better understand and navigate algorithmic trade-offs. This paper contributes a new way of providing designers the ability to understand and control the outcomes of algorithmic systems they are creating.
HCAug 20, 2019
Challenges of Designing HCI for Negative EmotionsMichal Luria, Amit Zoran, Jodi Forlizzi
Emotions that are perceived as "negative" are inherent in the human experience. Yet not much work in the field of HCI has looked into the role of these emotions in interaction with technology. As technology is becoming more social, personal and emotional by mediating our relationships and generating new social entities (such as conversational agents and robots), it is valuable to consider how it can support people's negative emotions and behaviors. Research in Psychology shows that interacting with negative emotions correctly can benefit well-being, yet the boundary between helpful and harmful is delicate. This workshop paper looks at the opportunities of designing for negative affect, and the challenge of "causing no harm" that arises in an attempt to do so.
HCAug 20, 2019
Championing Research Through Design in HRIMichal Luria, John Zimmerman, Jodi Forlizzi
One of the challenges in conducting research on the intersection of the CHI and Human-Robot Interaction (HRI) communities is in addressing the gap of acceptable design research methods between the two. While HRI is focused on interaction with robots and includes design research in its scope, the community is not as accustomed to exploratory design methods as the CHI community. This workshop paper argues for bringing exploratory design, and specifically Research through Design (RtD) methods that have been established in CHI for the past decade to the foreground of HRI. RtD can enable design researchers in the field of HRI to conduct exploratory design work that asks what is the right thing to design and share it within the community.
ROJul 9, 2017
Mathematical Models of Adaptation in Human-Robot CollaborationStefanos Nikolaidis, Jodi Forlizzi, David Hsu et al.
A robot operating in isolation needs to reason over the uncertainty in its model of the world and adapt its own actions to account for this uncertainty. Similarly, a robot interacting with people needs to reason over its uncertainty over the human internal state, as well as over how this state may change, as humans adapt to the robot. This paper summarizes our own work in this area, which depicts the different ways that probabilistic planning and game-theoretic algorithms can enable such reasoning in robotic systems that collaborate with people. We start with a general formulation of the problem as a two-player game with incomplete information. We then articulate the different assumptions within this general formulation, and we explain how these lead to exciting and diverse robot behaviors in real-time interactions with actual human subjects, in a variety of manufacturing, personal robotics and assistive care settings.
ROJun 14, 2017
Planning with Verbal Communication for Human-Robot CollaborationStefanos Nikolaidis, Minae Kwon, Jodi Forlizzi et al.
Human collaborators coordinate effectively their actions through both verbal and non-verbal communication. We believe that the the same should hold for human-robot teams. We propose a formalism that enables a robot to decide optimally between doing a task and issuing an utterance. We focus on two types of utterances: verbal commands, where the robot expresses how it wants its human teammate to behave, and state-conveying actions, where the robot explains why it is behaving this way. Human subject experiments show that enabling the robot to issue verbal commands is the most effective form of communicating objectives, while retaining user trust in the robot. Communicating why information should be done judiciously, since many participants questioned the truthfulness of the robot statements.