CRLGNov 26, 2023

Untargeted Code Authorship Evasion with Seq2Seq Transformation

arXiv:2311.15366v12 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the problem of protecting programmer privacy against authorship identification tools, though it builds incrementally on existing StructCoder technology.

The paper tackles code authorship attribution by developing SCAE, a Seq2Seq-based code transformation technique that achieves up to 95.77% evasion success rate while reducing processing time by about 68% with an 85% transformation success rate.

Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes