The Effect of Code Obfuscation on Human Program Comprehension
This study provides insights into the challenges obfuscation poses for human developers, which is important for software security and maintainability research. It is an incremental study on a known problem.
This paper investigates how code obfuscation affects human program comprehension using an output-prediction task. The study found that obfuscation generally increases reasoning time and reduces prediction accuracy, though the effect is not strictly monotonic and varies by language. For example, Python showed some renaming transformations performing comparably to or even better than unobfuscated code.
We investigate how code obfuscation influences human understanding of programs through an output-prediction task. To study this effect, we construct multiple levels of obfuscation, ranging from unobfuscated code to transformations involving identifier renaming, adversarially misleading identifiers, control-flow modifications, and combinations of these techniques. These transformations are applied to function-level programs written in Python and JavaScript. Participants were asked to predict program outputs while we recorded correctness, response time, and self-reported programming experience. Our results show that obfuscation generally increases the time required to reason about code and tends to reduce prediction accuracy. However, the relationship between obfuscation strength and performance is not strictly monotonic and varies across programming languages. JavaScript exhibits the expected pattern of increasing difficulty with stronger obfuscation, whereas Python displays a more complex trend in which certain renaming transformations can perform comparably to, or occasionally better than, the unobfuscated baseline. Response-time analyses further suggest that obfuscation shifts participants away from rapid, heuristic reasoning toward slower and more deliberate reasoning processes. Performance appears highest within a moderate range of response times, indicating that careful deliberation can improve accuracy, while extremely long response times often correspond to confusion. Finally, programming experience predicts performance primarily within a given language, with limited transfer across languages, suggesting that obfuscation challenges language-specific familiarity more than general programming ability.