CLCRNov 30, 2023

FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity

arXiv:2311.18580v225 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This addresses the need for better harmlessness evaluation in LLMs for AI safety and ethics, though it is incremental as it builds on existing benchmarks.

The paper tackles the problem of evaluating harmlessness in large language models (LLMs) by proposing FFT, a new benchmark with 2,116 instances for assessing factuality, fairness, and toxicity, and finds that the harmlessness of 9 evaluated LLMs is still unsatisfactory.

The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts, primarily stemming from factoid, unfair, and toxic content. Previous researchers have invested much effort in assessing the harmlessness of generative language models. However, existing benchmarks are struggling in the era of large language models (LLMs), due to the stronger language generation and instruction following capabilities, as well as wider applications. In this paper, we propose FFT, a new benchmark with 2116 elaborated-designed instances, for LLM harmlessness evaluation with factuality, fairness, and toxicity. To investigate the potential harms of LLMs, we evaluate 9 representative LLMs covering various parameter scales, training stages, and creators. Experiments show that the harmlessness of LLMs is still under-satisfactory, and extensive analysis derives some insightful findings that could inspire future research for harmless LLM research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes