Catherine Yeh

HC
h-index28
8papers
613citations
Novelty49%
AI Score48

8 Papers

CLJun 12, 2022
Self-critiquing models for assisting human evaluators

William Saunders, Catherine Yeh, Jeff Wu et al.

We fine-tune large language models to write natural language critiques (natural language critical comments) using behavioral cloning. On a topic-based summarization task, critiques written by our models help humans find flaws in summaries that they would have otherwise missed. Our models help find naturally occurring flaws in both model and human written summaries, and intentional flaws in summaries written by humans to be deliberately misleading. We study scaling properties of critiquing with both topic-based summarization and synthetic tasks. Larger models write more helpful critiques, and on most tasks, are better at self-critiquing, despite having harder-to-critique outputs. Larger models can also integrate their own self-critiques as feedback, refining their own summaries into better ones. Finally, we motivate and introduce a framework for comparing critiquing ability to generation and discrimination ability. Our measurements suggest that even large models may still have relevant knowledge they cannot or do not articulate as critiques. These results are a proof of concept for using AI-assisted human feedback to scale the supervision of machine learning systems to tasks that are difficult for humans to evaluate directly. We release our training datasets, as well as samples from our critique assistance experiments.

CLFeb 15, 2023Code
Envisioning the Next-Gen Document Reader

Catherine Yeh, Nedim Lipka, Franck Dernoncourt

People read digital documents on a daily basis to share, exchange, and understand information in electronic settings. However, current document readers create a static, isolated reading experience, which does not support users' goals of gaining more knowledge and performing additional tasks through document interaction. In this work, we present our vision for the next-gen document reader that strives to enhance user understanding and create a more connected, trustworthy information experience. We describe 18 NLP-powered features to add to existing document readers and propose a novel plug-in marketplace that allows users to further customize their reading experience, as demonstrated through 3 exploratory UI prototypes available at https://github.com/catherinesyeh/nextgen-prototypes

CLJun 12, 2024Code
Designing a Dashboard for Transparency and Control of Conversational AI

Yida Chen, Aoyu Wu, Trevor DePodesta et al.

Conversational LLMs function as black box systems, leaving users guessing about why they see the output they do. This lack of transparency is potentially problematic, especially given concerns around bias and truthfulness. To address this issue, we present an end-to-end prototype-connecting interpretability techniques with user experience design-that seeks to make chatbots more transparent. We begin by showing evidence that a prominent open-source LLM has a "user model": examining the internal state of the system, we can extract data related to a user's age, gender, educational level, and socioeconomic status. Next, we describe the design of a dashboard that accompanies the chatbot interface, displaying this user model in real time. The dashboard can also be used to control the user model and the system's behavior. Finally, we discuss a study in which users conversed with the instrumented system. Our results suggest that users appreciate seeing internal states, which helped them expose biased behavior and increased their sense of control. Participants also made valuable suggestions that point to future directions for both design and machine learning research. The project page and video demo of our TalkTuner system are available at https://bit.ly/talktuner-project-page

HCJan 29
Vidmento: Creating Video Stories Through Context-Aware Expansion With Generative Video

Catherine Yeh, Anh Truong, Mira Dontcheva et al.

Video storytelling is often constrained by available material, limiting creative expression and leaving undesired narrative gaps. Generative video offers a new way to address these limitations by augmenting captured media with tailored visuals. To explore this potential, we interviewed eight video creators to identify opportunities and challenges in integrating generative video into their workflows. Building on these insights and established filmmaking principles, we developed Vidmento, a tool for authoring hybrid video stories that combine captured and generated media through context-aware expansion. Vidmento surfaces opportunities for story development, generates clips that blend stylistically and narratively with surrounding media, and provides controls for refinement. In a study with 12 creators, Vidmento supported narrative development and exploration by systematically expanding initial materials with generative media, enabling expressive video storytelling aligned with creative intent. We highlight how creators bridge story gaps with generative content and where they find this blending capability most valuable.

HCFeb 13, 2024
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency

Catherine Yeh, Gonzalo Ramos, Rachel Ng et al. · microsoft-research

Large language models (LLMs) have become ubiquitous in providing different forms of writing assistance to different writers. However, LLM-powered writing systems often fall short in capturing the nuanced personalization and control needed to effectively support users -- particularly for those who lack experience with prompt engineering. To address these challenges, we introduce GhostWriter, an AI-enhanced design probe that enables users to exercise enhanced agency and personalization during writing. GhostWriter leverages LLMs to implicitly learn the user's intended writing style for seamless personalization, while exposing explicit teaching moments for style refinement and reflection. We study 18 participants who use GhostWriter on two distinct writing tasks, observing that it helps users craft personalized text generations and empowers them by providing multiple ways to control the system's writing style. Based on this study, we present insights on how specific design choices can promote greater user agency in AI-assisted writing and discuss people's evolving relationships with such systems. We conclude by offering design recommendations for future work.

HCSep 1, 2025
Chronotome: Real-Time Topic Modeling for Streaming Embedding Spaces

Matte Lim, Catherine Yeh, Martin Wattenberg et al.

Many real-world datasets -- from an artist's body of work to a person's social media history -- exhibit meaningful semantic changes over time that are difficult to capture with existing dimensionality reduction methods. To address this gap, we introduce a visualization technique that combines force-based projection and streaming clustering methods to build a spatial-temporal map of embeddings. Applying this technique, we create Chronotome, a tool for interactively exploring evolving themes in time-based data -- in real time. We demonstrate the utility of our approach through use cases on text and image data, showing how it offers a new lens for understanding the aesthetics and semantics of temporal datasets.

HCAug 9, 2025
Story Ribbons: Reimagining Storyline Visualizations with Large Language Models

Catherine Yeh, Tara Menon, Robin Singh Arya et al.

Analyzing literature involves tracking interactions between characters, locations, and themes. Visualization has the potential to facilitate the mapping and analysis of these complex relationships, but capturing structured information from unstructured story data remains a challenge. As large language models (LLMs) continue to advance, we see an opportunity to use their text processing and analysis capabilities to augment and reimagine existing storyline visualization techniques. Toward this goal, we introduce an LLM-driven data parsing pipeline that automatically extracts relevant narrative information from novels and scripts. We then apply this pipeline to create Story Ribbons, an interactive visualization system that helps novice and expert literary analysts explore detailed character and theme trajectories at multiple narrative levels. Through pipeline evaluations and user studies with Story Ribbons on 36 literary works, we demonstrate the potential of LLMs to streamline narrative visualization creation and reveal new insights about familiar stories. We also describe current limitations of AI-based systems, and interaction motifs designed to address these issues.

HCMay 4, 2023
AttentionViz: A Global View of Transformer Attention

Catherine Yeh, Yida Chen, Aoyu Wu et al.

Transformer models are revolutionizing machine learning, but their inner workings remain mysterious. In this work, we present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers that allows these models to learn rich, contextual relationships between elements of a sequence. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. Unlike previous attention visualization techniques, our approach enables the analysis of global patterns across multiple input sequences. We create an interactive visualization tool, AttentionViz (demo: http://attentionviz.com), based on these joint query-key embeddings, and use it to study attention mechanisms in both language and vision transformers. We demonstrate the utility of our approach in improving model understanding and offering new insights about query-key interactions through several application scenarios and expert feedback.