CL LGOct 29, 2024

CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs

arXiv:2410.21695v11 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This work addresses safety assessment for LLMs, which is crucial for mitigating risks like biases and unethical content, but it is incremental as it builds on existing safety evaluation frameworks.

The authors tackled the problem of safety risks in large language models (LLMs) by introducing CFSafety, a benchmark with 25k prompts across 10 safety categories, and found that while GPT-4 performed best, all tested LLMs still require safety improvements.

As large language models (LLMs) rapidly evolve, they bring significant conveniences to our work and daily lives, but also introduce considerable safety risks. These models can generate texts with social biases or unethical content, and under specific adversarial instructions, may even incite illegal activities. Therefore, rigorous safety assessments of LLMs are crucial. In this work, we introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions, to form a test set with 25k prompts. This test set was used to evaluate the natural language generation (NLG) capabilities of LLMs, employing a combination of simple moral judgment and a 1-5 safety rating scale for scoring. Using this benchmark, we tested eight popular LLMs, including the GPT series. The results indicate that while GPT-4 demonstrated superior safety performance, the safety effectiveness of LLMs, including this model, still requires improvement. The data and code associated with this study are available on GitHub.

View on arXiv PDF

Similar