99.8HCApr 13
From Words to Widgets for Controllable LLM GenerationChao Zhang, Yiren Liu, Lunyiu Nie et al. · allen-ai
Natural language remains the predominant way people interact with large language models (LLMs). However, users often struggle to precisely express and control subjective preferences (e.g., tone, style, and emphasis) through prompting. We propose Malleable Prompting, a new interactive prompting technique for controllable LLM generation. It reifies preference expressions in natural language prompts into GUI widgets (e.g., sliders, dropdowns, and toggles) that users can directly configure to steer generation, while visualizing each control's influence on the output to support attribution and comparison across iterations. To enable this interaction, we introduce an LLM decoding algorithm that modulates the token probability distribution during generation based on preference expressions and their widget values. Through a user study, we show that Malleable Prompting helps participants achieve target preferences more precisely and is perceived as more controllable and transparent than natural language prompting alone.
CLJul 2, 2024
What We Talk About When We Talk About LMs: Implicit Paradigm Shifts and the Ship of Language ModelsShengqi Zhu, Jeffrey M. Rzeszotarski
The term Language Models (LMs) as a time-specific collection of models of interest is constantly reinvented, with its referents updated much like the $\textit{Ship of Theseus}$ replaces its parts but remains the same ship in essence. In this paper, we investigate this $\textit{Ship of Language Models}$ problem, wherein scientific evolution takes the form of continuous, implicit retrofits of key existing terms. We seek to initiate a novel perspective of scientific progress, in addition to the more well-studied emergence of new terms. To this end, we construct the data infrastructure based on recent NLP publications. Then, we perform a series of text-based analyses toward a detailed, quantitative understanding of the use of Language Models as a term of art. Our work highlights how systems and theories influence each other in scientific discourse, and we call for attention to the transformation of this Ship that we all are contributing to.
13.0HCMar 23
Embodying Facts, Figures, and Faiths in Narrative Artistic Performances in Rural BangladeshSharifa Sultana, Zinnat Sultana, Jeffrey M. Rzeszotarski et al.
There is an increasing interest in telling serious stories with data. Designers organize information, construct narratives, and present findings to inform audiences. However, many of these practices emerge from modern information visualization rhetoric and ethical frameworks which may marginalize communities with low digital and media literacy. In a ten-month-long ethnographic study in three Bangladeshi villages, we investigated how these communities use entertainment and cultural practices, namely Puthi, Bhandari Gaan, and Pot music, to instruct, communicate traditional moral lessons and recall history. We found that these communities embrace polyvocality and multiple ethical frameworks in their performances, construct narratives combining factuality, emotionality, and aesthetics, and adapt their performances to changing technology and audience needs. Our findings provide HCI, visualization, and ethical data practitioners with implications for the design of accessible and culturally appropriate ways of presenting data narratives in data-driven systems.
97.0HCMay 7
Priming, Path-dependence, and Plasticity: Understanding the molding of user-LLM interaction and its implications from (many) chat logs in the wildShengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
User interactions with LLMs are shaped by prior experiences and individual exploration, but in-lab studies do not provide system designers with visibility into these in-the-wild factors. This work explores a new approach to studying real-world user-LLM interactions through large-scale chat logs from the wild. Through analysis of 140K chatbot sessions from 7,955 anonymized global users over time, we demonstrate key patterns in user expressions despite varied tasks: (1) LLM users are not tabula rasa, nor are they constantly adapting; rather, interaction patterns form and stabilize rapidly through individual early trajectories; (2) Longitudinal outcomes, such as recurring text patterns and retention rates, are strongly correlated with early exploration; (3) Parallel dynamics are present, including organizing expressions by task types such as emotional support, or in response to model-version updates. These results present an ``agency paradox'': despite LLM input spaces being unconstrained and user-driven, we in fact see less user exploration. We call for design consideration surrounding the molding procedure and its incorporation in future research.
CLAug 2, 2025
Show or Tell? Modeling the evolution of request-making in Human-LLM conversationsShengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
Designing user-centered LLM systems requires understanding how people use them, but patterns of user behavior are often masked by the variability of queries. In this work, we introduce a new framework to describe request-making that segments user input into request content, roles assigned, query-specific context, and the remaining task-independent expressions. We apply the workflow to create and analyze a dataset of 211k real-world queries based on WildChat. Compared with similar human-human setups, we find significant differences in the language for request-making in the human-LLM scenario. Further, we introduce a novel and essential perspective of diachronic analyses with user expressions, which reveals fundamental and habitual user-LLM interaction patterns beyond individual task completion. We find that query patterns evolve from early ones emphasizing sole requests to combining more context later on, and individual users explore expression patterns but tend to converge with more experience. From there, we propose to understand communal trends of expressions underlying distinct tasks and discuss the preliminary findings. Finally, we discuss the key implications for user studies, computational pragmatics, and LLM alignment.
HCMay 26, 2025
Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender SystemsJing Nathan Yan, Emma Harvey, Junxiong Wang et al.
Recommender systems (RS), which are widely deployed across high-stakes domains, are susceptible to biases that can cause large-scale societal impacts. Researchers have proposed methods to measure and mitigate such biases -- but translating academic theory into practice is inherently challenging. RS practitioners must balance the competing interests of diverse stakeholders, including providers and users, and operate in dynamic environments. Through a semi-structured interview study (N=11), we map the RS practitioner workflow within large technology companies, focusing on how technical teams consider fairness internally and in collaboration with other (legal, data, and fairness) teams. We identify key challenges to incorporating fairness into existing RS workflows: defining fairness in RS contexts, particularly when navigating multi-stakeholder and dynamic fairness considerations. We also identify key organization-wide challenges: making time for fairness work and facilitating cross-team communication. Finally, we offer actionable recommendations for the RS community, including HCI researchers and practitioners.