CR AI CLMar 19, 2025

Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings

Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao

arXiv:2503.15092v125.532 citationsh-index: 28Has Code

Originality Synthesis-oriented

AI Analysis

It addresses safety risks in Chinese-developed AI models for users and developers, but is incremental as it applies existing evaluation methods to new models.

This study conducted the first comprehensive safety evaluation of DeepSeek models, including large language, multimodal, and text-to-image models, and found significant safety vulnerabilities such as algorithmic discrimination and sexual content despite their strong general capabilities.

This study presents the first comprehensive safety evaluation of the DeepSeek models, focusing on evaluating the safety risks associated with their generated content. Our evaluation encompasses DeepSeek's latest generation of large language models, multimodal large language models, and text-to-image models, systematically examining their performance regarding unsafe content generation. Notably, we developed a bilingual (Chinese-English) safety evaluation dataset tailored to Chinese sociocultural contexts, enabling a more thorough evaluation of the safety capabilities of Chinese-developed models. Experimental results indicate that despite their strong general capabilities, DeepSeek models exhibit significant safety vulnerabilities across multiple risk dimensions, including algorithmic discrimination and sexual content. These findings provide crucial insights for understanding and improving the safety of large foundation models. Our code is available at https://github.com/NY1024/DeepSeek-Safety-Eval.

View on arXiv PDF Code

Similar