HCAIJul 8, 2024

Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity

arXiv:2407.05977v16 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the problem of toxicity attribution in AI interactions for researchers and developers, though it is incremental by focusing on real-world settings.

The study investigated the source of toxic content in human-LLM conversations, finding that toxicity is often provoked by humans seeking such content, with manual analysis of hundreds of toxic conversations questioning current refusal practices.

This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings in contrast to most prior research focusing on ethically trimmed models like ChatGPT for specific tasks. We aim to understand the originator of toxicity. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content. Our manual analysis of hundreds of conversations judged as toxic by APIs commercial vendors, also raises questions with respect to current practices of what user requests are refused to answer. Furthermore, we conjecture based on multiple empirical indicators that humans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes