HC AIAug 9, 2025

Highlight All the Phrases: Enhancing LLM Transparency through Visual Factuality Indicators

Hyo Jin Do, Rachel Ostrand, Werner Geyer, Keerthiram Murugesan, Dennis Wei, Justin Weisz

arXiv:2508.06846v17.21 citationsh-index: 16Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

Originality Synthesis-oriented

AI Analysis

This work addresses the gap in user communication of LLM factuality for developers and designers, offering incremental design guidelines to enhance transparency and trust.

The study tackled the problem of communicating LLM factuality scores to users by comparing design strategies in experiments with 208 participants, finding that color-coding all phrases based on factuality scores was preferred, trusted, and made accuracy validation easier.

Large language models (LLMs) are susceptible to generating inaccurate or false information, often referred to as "hallucinations" or "confabulations." While several technical advancements have been made to detect hallucinated content by assessing the factuality of the model's responses, there is still limited research on how to effectively communicate this information to users. To address this gap, we conducted two scenario-based experiments with a total of 208 participants to systematically compare the effects of various design strategies for communicating factuality scores by assessing participants' ratings of trust, ease in validating response accuracy, and preference. Our findings reveal that participants preferred and trusted a design in which all phrases within a response were color-coded based on factuality scores. Participants also found it easier to validate accuracy of the response in this style compared to a baseline with no style applied. Our study offers practical design guidelines for LLM application developers and designers, aimed at calibrating user trust, aligning with user preferences, and enhancing users' ability to scrutinize LLM outputs.

View on arXiv PDF

Similar