Jessica Maria Echterhoff

h-index6

8papers

210citations

Novelty48%

AI Score44

Ranked #49,702 of 194,257 authors (top 26%)#9,899 in CL (top 32%)

8 Papers

1.3CLJul 5, 2023

Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews

Jessica Echterhoff, An Yan, Julian McAuley

It is time-consuming to find the best product among many similar alternatives. Comparative sentences can help to contrast one item from others in a way that highlights important features of an item that stand out. Given reviews of one or multiple items and relevant item features, we generate comparative review sentences to aid users to find the best fit. Specifically, our model consists of three successive components in a transformer: (i) an item encoding module to encode an item for comparison, (ii) a comparison generation module that generates comparative sentences in an autoregressive manner, (iii) a novel decoding method for user personalization. We show that our pipeline generates fluent and diverse comparative sentences. We run experiments on the relevance and fidelity of our generated sentences in a human evaluation study and find that our algorithm creates comparative review sentences that are relevant and truthful.

36.7AIFeb 25, 2024Code

Cognitive Bias in Decision-Making with LLMs

Jessica Echterhoff, Yao Liu, Abeer Alessa et al.

Large language models (LLMs) offer significant potential as tools to support an expanding range of decision-making tasks. Given their training on human (created) data, LLMs have been shown to inherit societal biases against protected groups, as well as be subject to bias functionally resembling cognitive bias. Human-like bias can impede fair and explainable decisions made with LLM assistance. Our work introduces BiasBuster, a framework designed to uncover, evaluate, and mitigate cognitive bias in LLMs, particularly in high-stakes decision-making tasks. Inspired by prior research in psychology and cognitive science, we develop a dataset containing 13,465 prompts to evaluate LLM decisions on different cognitive biases (e.g., prompt-induced, sequential, inherent). We test various bias mitigation strategies, while proposing a novel method utilizing LLMs to debias their own human-like cognitive bias within prompts. Our analysis provides a comprehensive picture of the presence and effects of cognitive bias across commercial and open-source models. We demonstrate that our selfhelp debiasing effectively mitigates model answers that display patterns akin to human cognitive bias without having to manually craft examples for each bias.

10.4CVOct 25, 2023Code

Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving

Jessica Echterhoff, An Yan, Kyungtae Han et al.

Concept bottleneck models have been successfully used for explainable machine learning by encoding information within the model with a set of human-defined concepts. In the context of human-assisted or autonomous driving, explainability models can help user acceptance and understanding of decisions made by the autonomous vehicle, which can be used to rationalize and explain driver or vehicle behavior. We propose a new approach using concept bottlenecks as visual features for control command predictions and explanations of user and vehicle behavior. We learn a human-understandable concept layer that we use to explain sequential driving scenes while learning vehicle control commands. This approach can then be used to determine whether a change in a preferred gap or steering commands from a human (or autonomous vehicle) is led by an external stimulus or change in preferences. We achieve competitive performance to latent visual features while gaining interpretability within our model setup.

2.1HCAug 14, 2023

SpecTracle: Wearable Facial Motion Tracking from Unobtrusive Peripheral Cameras

Yinan Xuan, Varun Viswanath, Sunny Chu et al.

Facial motion tracking in head-mounted displays (HMD) has the potential to enable immersive "face-to-face" interaction in a virtual environment. However, current works on facial tracking are not suitable for unobtrusive augmented reality (AR) glasses or do not have the ability to track arbitrary facial movements. In this work, we demonstrate a novel system called SpecTracle that tracks a user's facial motions using two wide-angle cameras mounted right next to the visor of a Hololens. Avoiding the usage of cameras extended in front of the face, our system greatly improves the feasibility to integrate full-face tracking into a low-profile form factor. We also demonstrate that a neural network-based model processing the wide-angle cameras can run in real-time at 24 frames per second (fps) on a mobile GPU and track independent facial movement for different parts of the face with a user-independent model. Using a short personalized calibration, the system improves its tracking performance by 42.3% compared to the user-independent model.

22.6CLMar 13, 2024Code

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

Se-eun Yoon, Zhankui He, Jessica Maria Echterhoff et al.

Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational recommendation. This protocol is comprised of five tasks, each designed to evaluate a key property that a synthetic user should exhibit: choosing which items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendations, and giving feedback. Through evaluation of baseline simulators, we demonstrate these tasks effectively reveal deviations of language models from human behavior, and offer insights on how to reduce the deviations with model selection and prompting strategies.

10.2CVJul 24, 2025

PDB-Eval: An Evaluation of Large Multimodal Models for Description and Explanation of Personalized Driving Behavior

Junda Wu, Jessica Echterhoff, Kyungtae Han et al.

Understanding a driver's behavior and intentions is important for potential risk assessment and early accident prevention. Safety and driver assistance systems can be tailored to individual drivers' behavior, significantly enhancing their effectiveness. However, existing datasets are limited in describing and explaining general vehicle movements based on external visual evidence. This paper introduces a benchmark, PDB-Eval, for a detailed understanding of Personalized Driver Behavior, and aligning Large Multimodal Models (MLLMs) with driving comprehension and reasoning. Our benchmark consists of two main components, PDB-X and PDB-QA. PDB-X can evaluate MLLMs' understanding of temporal driving scenes. Our dataset is designed to find valid visual evidence from the external view to explain the driver's behavior from the internal view. To align MLLMs' reasoning abilities with driving tasks, we propose PDB-QA as a visual explanation question-answering task for MLLM instruction fine-tuning. As a generic learning task for generative models like MLLMs, PDB-QA can bridge the domain gap without harming MLLMs' generalizability. Our evaluation indicates that fine-tuning MLLMs on fine-grained descriptions and explanations can effectively bridge the gap between MLLMs and the driving domain, which improves zero-shot performance on question-answering tasks by up to 73.2%. We further evaluate the MLLMs fine-tuned on PDB-X in Brain4Cars' intention prediction and AIDE's recognition tasks. We observe up to 12.5% performance improvements on the turn intention prediction task in Brain4Cars, and consistent performance improvements up to 11.0% on all tasks in AIDE.

6.7CLJul 3, 2025

How Much Content Do LLMs Generate That Induces Cognitive Bias in Users?

Abeer Alessa, Akshaya Lakshminarasimhan, Param Somane et al.

Large language models (LLMs) are increasingly integrated into applications ranging from review summarization to medical diagnosis support, where they affect human decisions. Even though LLMs perform well in many tasks, they may also inherit societal or cognitive biases, which can inadvertently transfer to humans. We investigate when and how LLMs expose users to biased content and quantify its severity. Specifically, we assess three LLM families in summarization and news fact-checking tasks, evaluating how much LLMs stay consistent with their context and/or hallucinate. Our findings show that LLMs expose users to content that changes the sentiment of the context in 21.86% of the cases, hallucinates on post-knowledge-cutoff data questions in 57.33% of the cases, and primacy bias in 5.94% of the cases. We evaluate 18 distinct mitigation methods across three LLM families and find that targeted interventions can be effective. Given the prevalent use of LLMs in high-stakes domains, such as healthcare or legal analysis, our results highlight the need for robust technical safeguards and for developing user-centered interventions that address LLM limitations.

5.8HCAug 17, 2020

PAR: Personal Activity Radius Camera View for Contextual Sensing

Jessica Maria Echterhoff, Edward J. Wang

Contextual sensing using wearable cameras has seen a variety of different camera angles proposed to capture a wide gamut of different visual scenes. In this paper, we propose a new camera view that aims to capture the same visual information as many of the camera positions and orientations combined from a single camera view point. The camera, mounted on the corner of a glasses frame is pointing downwards towards the floor, a field-of-view we named Personal Activity Radius (PAR). The PAR field-of-view captures the visual information around a wearer's personal bubble, including items they interact with, their body motion, their surrounding environment, etc. In our evaluation, we tested the PAR view's interpretability by human labelers in two different activity tracking scenarios: food related behaviors and exercise tracking. Human labelers achieved an overall high level of precision in identifying body motions in exercise tracking of 91% precision and eating/drinking motions at 96% precision. Item interaction identification reached a precision of 86% precision for labeling grocery categories. We show a high level on the device setup and contextual views we were able to capture with the device. We see that the camera wide angle captures different activities such as driving, shopping, gym exercises, walking and eating and can observe the specific interaction item of the user as well as the immediate contextual surrounding.