AICRDec 6, 2024

OCEAN: Open-World Contrastive Authorship Identification

arXiv:2412.05049v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the need for accurate authorship identification in software supply chains to enhance cybersecurity, representing a domain-specific advancement.

The paper tackles the problem of code authorship attribution in binary files for cybersecurity by proposing OCEAN, a contrastive learning-based system that achieves an AUROC score of 0.86 on unseen data and improves performance by 7% over previous datasets.

In an era where cyberattacks increasingly target the software supply chain, the ability to accurately attribute code authorship in binary files is critical to improving cybersecurity measures. We propose OCEAN, a contrastive learning-based system for function-level authorship attribution. OCEAN is the first framework to explore code authorship attribution on compiled binaries in an open-world and extreme scenario, where two code samples from unknown authors are compared to determine if they are developed by the same author. To evaluate OCEAN, we introduce new realistic datasets: CONAN, to improve the performance of authorship attribution systems in real-world use cases, and SNOOPY, to increase the robustness of the evaluation of such systems. We use CONAN to train our model and evaluate on SNOOPY, a fully unseen dataset, resulting in an AUROC score of 0.86 even when using high compiler optimizations. We further show that CONAN improves performance by 7% compared to the previously used Google Code Jam dataset. Additionally, OCEAN outperforms previous methods in their settings, achieving a 10% improvement over state-of-the-art SCS-Gan in scenarios analyzing source code. Furthermore, OCEAN can detect code injections from an unknown author in a software update, underscoring its value for securing software supply chains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes