Trustworthy AI: Safety, Bias, and Privacy -- A Survey
This survey addresses the problem of trustworthy AI for researchers and practitioners working on AI systems, providing an incremental contribution to the field.
This paper tackles the problem of trustworthy AI by investigating safety, bias, and privacy concerns, presenting insights and perspectives on these issues. The result is a comprehensive survey of the current state of the field, but no specific numbers or metrics are provided.
The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models' trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.