Supporting Sensemaking of Large Language Model Outputs at Scale
This work addresses a practical problem for end-users and system designers in interpreting LLM outputs, but it is incremental as it builds on existing methods for text analysis.
The paper tackled the problem of helping users and designers make sense of multiple LLM responses to a single prompt by designing five features for presentation, including novel similarity and difference methods, and found in a user study (n=24) and case studies that these features support various sensemaking tasks and make previously difficult tasks tractable.
Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability. In this paper, we explore how to present many LLM responses at once. We design five features, which include both pre-existing and novel methods for computing similarities and differences across textual documents, as well as how to render their outputs. We report on a controlled user study (n=24) and eight case studies evaluating these features and how they support users in different tasks. We find that the features support a wide variety of sensemaking tasks and even make tasks previously considered to be too difficult by our participants now tractable. Finally, we present design guidelines to inform future explorations of new LLM interfaces.