Integrated Reasoning Engine for Pointer-related Code Clone Detection
This addresses the issue of manual review inefficiency in code clone detection for software security and maintenance, though it appears incremental as it builds on existing techniques.
The paper tackles the problem of false positives in pointer-related code clone detection by proposing Twin-Finder+, a closed-loop approach integrating machine learning and symbolic execution with formal verification to automate manual reviews. The results show it removes 91.69% of false positives on average and finds 6 unreported bugs in Links version 2.14 and one patched bug in libreOffice-6.0.0.1.
Detecting similar code fragments, usually referred to as code clones, is an important task. In particular, code clone detection can have significant uses in the context of vulnerability discovery, refactoring and plagiarism detection. However, false positives are inevitable and always require manual reviews. In this paper, we propose Twin-Finder+, a novel closed-loop approach for pointer-related code clone detection that integrates machine learning and symbolic execution techniques to achieve precision. Twin-Finder+ introduces a formal verification mechanism to automate such manual reviews process. Our experimental results show Twin-Finder+ that can remove 91.69% false positives in average. We further conduct security analysis for memory safety using real-world applications, Links version 2.14 and libreOffice-6.0.0.1. Twin-Finder+ is able to find 6 unreported bugs in Links version 2.14 and one public patched bug in libreOffice-6.0.0.1.