LG SE MLJun 5, 2020

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar

arXiv:2006.05265v610.19 citations

Originality Incremental advance

AI Analysis

This addresses the need for more reliable code similarity systems in software engineering, though it appears incremental as it builds on existing neural approaches.

The paper tackles the problem of code semantics similarity for tasks like code recommendation and defect correction by presenting MISIM, a neural system that achieves 8.08% better accuracy than the next best system on a dataset of 328K programs.

Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses a novel context-aware semantics structure, which was purpose-built to lift semantics from code syntax; (ii)MISIM uses an extensible neural code similarity scoring algorithm, which can be used for various neural network architectures with learned parameters. We compare MISIM to four state-of-the-art systems, including two additional hand-customized models, over 328K programs consisting of over 18 million lines of code. Our experiments show that MISIM has 8.08% better accuracy (using MAP@R) compared to the next best performing system.

View on arXiv PDF

Similar