CRAICLLGFeb 11, 2025

Trustworthy AI: Safety, Bias, and Privacy -- A Survey

arXiv:2502.10450v26 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This survey addresses the problem of trustworthy AI for researchers and practitioners working on AI systems, providing an incremental contribution to the field.

This paper tackles the problem of trustworthy AI by investigating safety, bias, and privacy concerns, presenting insights and perspectives on these issues. The result is a comprehensive survey of the current state of the field, but no specific numbers or metrics are provided.

The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models' trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes