CLAIApr 8, 2024

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

arXiv:2404.05399v280 citationsh-index: 28AAAI
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge for researchers and practitioners in navigating the fragmented landscape of LLM safety datasets to improve evaluation and mitigation efforts.

The authors conducted a systematic review of 144 open datasets for evaluating and improving large language model safety, identifying trends like synthetic datasets and gaps such as lack of non-English resources, and found that current evaluation practices are idiosyncratic and underutilize available datasets.

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and toxic content generation to the assessment of longer-term catastrophic risk potential. This makes it difficult for researchers and practitioners to find the most relevant datasets for their use case, and to identify gaps in dataset coverage that future work may fill. To remedy these issues, we conduct a first systematic review of open datasets for evaluating and improving LLM safety. We review 144 datasets, which we identified through an iterative and community-driven process over the course of several months. We highlight patterns and trends, such as a trend towards fully synthetic datasets, as well as gaps in dataset coverage, such as a clear lack of non-English and naturalistic datasets. We also examine how LLM safety datasets are used in practice -- in LLM release publications and popular LLM benchmarks -- finding that current evaluation practices are highly idiosyncratic and make use of only a small fraction of available datasets. Our contributions are based on SafetyPrompts.com, a living catalogue of open datasets for LLM safety, which we plan to update continuously as the field of LLM safety develops.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes