SEMar 11

Bridging Behavioral Biometrics and Source Code Stylometry: A Survey of Programmer Attribution

arXiv:2603.11150v18.41 citationsh-index: 54
Predicted impact top 74% in SE · last 90 daysOriginality Synthesis-oriented
AI Analysis

It consolidates research for software engineering, security, and forensics, highlighting methodological gaps to guide future work, but is incremental as a survey.

This survey systematically maps programmer attribution research, analyzing 47 studies from 2012 to 2025 to identify a strong focus on closed-world authorship attribution with stylometric features and reliance on few benchmarks, while behavioral signals and reproducibility are under-explored.

Programmer attribution seeks to identify or verify the author of a source code artifact using stylistic, structural, or behavioural characteristics. This problem has been studied across software engineering, security, and digital forensics, resulting in a growing and methodologically diverse set of publications. This paper presents a systematic mapping study of programmer attribution research focused on source code analysis. From an initial set of 135 candidate publications, 47 studies published between 2012 and 2025 were selected through a structured screening process. The included works are analysed along several dimensions, including authorship tasks, feature categories, learning and modelling approaches, dataset sources, and evaluation practices. Based on this analysis, we derive a taxonomy that relates stylistic and behavioural feature types to commonly used machine learning techniques and provide a descriptive overview of publication trends, benchmarks, programming languages. A content-level analysis highlights the main thematic clusters in the field. The results indicate a strong focus on closed-world authorship attribution using stylometric features and a heavy reliance on a small number of benchmark datasets, while behavioural signals, authorship verification, and reproducibility remain less explored. The study consolidates existing research into a unified framework and outlines methodological gaps that can guide future work. This manuscript is currently under review. The present version is a preprint.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes