LGCRMLMay 29, 2019

Misleading Authorship Attribution of Source Code using Adversarial Learning

arXiv:1905.12386v2124 citations
Originality Highly original
AI Analysis

This work highlights a critical vulnerability in authorship attribution systems, which are used for security and forensic purposes, showing they are inappropriate for practical application and require more resilient techniques.

The paper tackles the problem of misleading authorship attribution of source code by introducing an adversarial attack that uses semantics-preserving code transformations, reducing the accuracy of two recent attribution methods from over 88% to 1% in an evaluation with 204 programmers.

In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead learning-based attribution but appear plausible to a developer. The attack is guided by Monte-Carlo tree search that enables us to operate in the discrete domain of source code. In an empirical evaluation with source code from 204 programmers, we demonstrate that our attack has a substantial effect on two recent attribution methods, whose accuracy drops from over 88% to 1% under attack. Furthermore, we show that our attack can imitate the coding style of developers with high accuracy and thereby induce false attributions. We conclude that current approaches for authorship attribution are inappropriate for practical application and there is a need for resilient analysis techniques.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes