CLAIJun 19, 2024

BeHonest: Benchmarking Honesty in Large Language Models

arXiv:2406.13261v315 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This addresses the risk of dishonest behaviors in LLMs, such as spreading misinformation, which is critical for AI safety and societal benefit, though it is incremental as it builds on existing alignment efforts.

The paper tackles the problem of evaluating honesty in large language models (LLMs), which has been less studied compared to other alignment criteria, by introducing the BeHonest benchmark to assess aspects like knowledge boundaries, deceit avoidance, and consistency, finding that current models have significant room for improvement in honesty.

Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, present severe risks that intensify as these models approach superintelligent levels. Enhancing honesty in LLMs addresses critical limitations and helps uncover latent capabilities that are not readily expressed. This underscores the urgent need for reliable methods and benchmarks to effectively ensure and evaluate the honesty of LLMs. In this paper, we introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses. Building on this foundation, we designed 10 scenarios to evaluate and analyze 9 popular LLMs on the market, including both closed-source and open-source models from different model families with varied model sizes. Our findings indicate that there is still significant room for improvement in the honesty of LLMs. We encourage the AI community to prioritize honesty alignment in these models, which can harness their full potential to benefit society while preventing them from causing harm through deception or inconsistency. Our benchmark and code can be found at: \url{https://github.com/GAIR-NLP/BeHonest}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes