CLApr 24, 2025

Safety in Large Reasoning Models: A Survey

Cheng Wang, Yue Liu, Baolong Bi, Duzhen Zhang, Zhong-Zhi Li, Yingwei Ma, Yufei He, Shengju Yu, Xinfeng Li, Junfeng Fang, Jiaheng Zhang, Bryan Hooi

arXiv:2504.17704v331.268 citationsh-index: 14EMNLP

Originality Synthesis-oriented

AI Analysis

It addresses safety concerns for researchers and developers deploying LRMs in real-world settings, but it is incremental as it synthesizes existing knowledge rather than introducing new methods.

This survey tackles the problem of safety risks in Large Reasoning Models (LRMs) by comprehensively exploring and summarizing vulnerabilities, attacks, and defense strategies, organizing them into a taxonomy to provide a structured understanding of the current safety landscape.

Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents a comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies. By organizing these elements into a detailed taxonomy, this work aims to offer a clear and structured understanding of the current safety landscape of LRMs, facilitating future research and development to enhance the security and reliability of these powerful models.

View on arXiv PDF

Similar