A Survey of Neural Trojan Attacks and Defenses in Deep Learning
This is an incremental survey paper that organizes existing literature on neural Trojans for researchers and practitioners in AI security.
The paper surveys neural Trojan attacks and defenses in deep learning, highlighting how outsourcing model training increases susceptibility to these attacks, and provides a comprehensive review to help the broader community understand recent developments.
Artificial Intelligence (AI) relies heavily on deep learning - a technology that is becoming increasingly popular in real-life applications of AI, even in the safety-critical and high-risk domains. However, it is recently discovered that deep learning can be manipulated by embedding Trojans inside it. Unfortunately, pragmatic solutions to circumvent the computational requirements of deep learning, e.g. outsourcing model training or data annotation to third parties, further add to model susceptibility to the Trojan attacks. Due to the key importance of the topic in deep learning, recent literature has seen many contributions in this direction. We conduct a comprehensive review of the techniques that devise Trojan attacks for deep learning and explore their defenses. Our informative survey systematically organizes the recent literature and discusses the key concepts of the methods while assuming minimal knowledge of the domain on the readers part. It provides a comprehensible gateway to the broader community to understand the recent developments in Neural Trojans.