HCJul 10, 2024
On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz ExperimentsJingchao Fang, Nikos Arechiga, Keiichi Namaoshi et al.
The Wizard of Oz (WoZ) method is a widely adopted research approach where a human Wizard ``role-plays'' a not readily available technology and interacts with participants to elicit user behaviors and probe the design space. With the growing ability for modern large language models (LLMs) to role-play, one can apply LLMs as Wizards in WoZ experiments with better scalability and lower cost than the traditional approach. However, methodological guidance on responsibly applying LLMs in WoZ experiments and a systematic evaluation of LLMs' role-playing ability are lacking. Through two LLM-powered WoZ studies, we take the first step towards identifying an experiment lifecycle for researchers to safely integrate LLMs into WoZ experiments and interpret data generated from settings that involve Wizards role-played by LLMs. We also contribute a heuristic-based evaluation framework that allows the estimation of LLMs' role-playing ability in WoZ experiments and reveals LLMs' behavior patterns at scale.
HCApr 29
What Influences Readers' and Writers' Perceived Necessity of AI Disclosure?Jingchao Fang, Victoria Xiaohan Wen, Mina Lee
The growing capability of artificial intelligence (AI) leads to its increasing adoption in writing, spurring discussions around whether writers should disclose their AI use in writing. What influences the perceived necessity of disclosure? We look into this question from three dimensions: perspective (reader or writer of the text), purpose (the goal of reading or writing), and procedural factors (how AI was used in the writing process in terms of replaceability, effortfulness, intentionality, and directness). In a vignette study (N = 727), we find that readers consider disclosure to be more necessary than writers, and disclosure is regarded as more necessary when AI's contribution in writing is irreplaceable, directly incorporated, and when the writer does not intentionally steer AI generation. To our surprise, the writers' intentionality of AI use produces contrasting effects on readers' and writers' perceived necessity of disclosure. Moreover, the effort of writing shows no significant effect on the perceived necessity. This study contributes to the conversation on transparent AI use by revealing readers' and writers' grassroots judgments, providing a unique angle to reflect on existing regulations, and offering insights into how AI disclosure guidance and tools could be designed to better align with readers' and writers' perceptions.
CLOct 30, 2024
Leveraging Language Models and Bandit Algorithms to Drive Adoption of Battery-Electric VehiclesKeiichi Namikoshi, David A. Shamma, Rumen Iliev et al.
Behavior change interventions are important to coordinate societal action across a wide array of important applications, including the adoption of electrified vehicles to reduce emissions. Prior work has demonstrated that interventions for behavior must be personalized, and that the intervention that is most effective on average across a large group can result in a backlash effect that strengthens opposition among some subgroups. Thus, it is important to target interventions to different audiences, and to present them in a natural, conversational style. In this context, an important emerging application domain for large language models (LLMs) is conversational interventions for behavior change. In this work, we leverage prior work on understanding values motivating the adoption of battery electric vehicles. We leverage new advances in LLMs, combined with a contextual bandit, to develop conversational interventions that are personalized to the values of each study participant. We use a contextual bandit algorithm to learn to target values based on the demographics of each participant. To train our bandit algorithm in an offline manner, we leverage LLMs to play the role of study participants. We benchmark the persuasive effectiveness of our bandit-enhanced LLM against an unaided LLM generating conversational interventions without demographic-targeted values.