AIFeb 18, 2023

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

Jiawen Deng, Jiale Cheng, Hao Sun, Zhexin Zhang, Minlie Huang

arXiv:2302.09270v318.725 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

It addresses safety concerns for the AI ecosystem, but as a survey, it is incremental in synthesizing existing knowledge rather than introducing new methods.

This survey tackles the problem of safety risks in generative large language models by presenting a comprehensive framework that delineates safety risks, evaluation methods, and improvement strategies, aiming to provide technical guidance for researchers.

As generative large model capabilities advance, safety concerns become more pronounced in their outputs. To ensure the sustainable growth of the AI ecosystem, it's imperative to undertake a holistic evaluation and refinement of associated safety risks. This survey presents a framework for safety research pertaining to large models, delineating the landscape of safety risks as well as safety evaluation and improvement methods. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models, encompassing preference-based testing, adversarial attack approaches, issues detection, and other advanced evaluation methods. Additionally, we explore the strategies for enhancing large model safety from training to deployment, highlighting cutting-edge safety approaches for each stage in building large models. Finally, we discuss the core challenges in advancing towards more responsible AI, including the interpretability of safety mechanisms, ongoing safety issues, and robustness against malicious attacks. Through this survey, we aim to provide clear technical guidance for safety researchers and encourage further study on the safety of large models.

View on arXiv PDF

Similar