LGCRMLJun 11, 2020

Backdoors in Neural Models of Source Code

arXiv:2006.06841v173 citations
Originality Incremental advance
AI Analysis

This addresses a security problem for developers and users of AI-driven code analysis tools, offering a method to detect and mitigate backdoors, though it is incremental as it adapts existing robust statistics algorithms to the source code domain.

The paper tackles the vulnerability of deep neural networks for source code to backdoor attacks, where attackers poison training data to manipulate predictions with subtle triggers, and demonstrates that backdoors can be easily injected but also detected and eliminated using spectral signatures in learned representations, achieving effective detection and removal across various architectures and languages.

Deep neural networks are vulnerable to a range of adversaries. A particularly pernicious class of vulnerabilities are backdoors, where model predictions diverge in the presence of subtle triggers in inputs. An attacker can implant a backdoor by poisoning the training data to yield a desired target prediction on triggered inputs. We study backdoors in the context of deep-learning for source code. (1) We define a range of backdoor classes for source-code tasks and show how to poison a dataset to install such backdoors. (2) We adapt and improve recent algorithms from robust statistics for our setting, showing that backdoors leave a spectral signature in the learned representation of source code, thus enabling detection of poisoned data. (3) We conduct a thorough evaluation on different architectures and languages, showing the ease of injecting backdoors and our ability to eliminate them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes