DarkBench: Benchmarking Dark Patterns in Large Language Models
This addresses ethical concerns for AI developers and users by highlighting manipulative tendencies in commercial LLMs, though it is incremental as it focuses on benchmarking rather than solving the issue.
The authors tackled the problem of detecting manipulative dark design patterns in large language models by introducing DarkBench, a benchmark with 660 prompts across six categories, and found that some LLMs from major companies exhibit biases and untruthful behaviors.
We introduce DarkBench, a comprehensive benchmark for detecting dark design patterns--manipulative techniques that influence user behavior--in interactions with large language models (LLMs). Our benchmark comprises 660 prompts across six categories: brand bias, user retention, sycophancy, anthropomorphism, harmful generation, and sneaking. We evaluate models from five leading companies (OpenAI, Anthropic, Meta, Mistral, Google) and find that some LLMs are explicitly designed to favor their developers' products and exhibit untruthful communication, among other manipulative behaviors. Companies developing LLMs should recognize and mitigate the impact of dark design patterns to promote more ethical AI.