SEFeb 21, 2018

Path-Based Function Embedding and its Application to Specification Mining

arXiv:1802.07779v236 citations
AI Analysis

This addresses the need for better program understanding and debugging in software engineering by enabling identification of function synonyms, though it is incremental as it builds on existing embedding and random walk techniques.

The paper tackles the problem of identifying function synonyms in code, which are functions with similar roles but not necessarily equivalent, by presenting func2vec, an algorithm that embeds functions into vectors to group synonyms together, and demonstrates its effectiveness in the Linux kernel with applications to mining error-handling specifications.

Identifying the relationships among program elements is useful for program understanding, debugging, and analysis. One such relationship is synonymy. Function synonyms are functions that play a similar role in code, e.g. functions that perform initialization for different device drivers, or functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents func2vec, an algorithm that maps each function to a vector in a vector space such that function synonyms are grouped together. We compute the function embedding by training a neural network on sentences generated from random walks over an encoding of the program as a labeled pushdown system (l-PDS). We demonstrate that func2vec is effective at identifying function synonyms in the Linux kernel. Furthermore, we show how function synonyms enable mining error-handling specifications with high support in Linux file systems and drivers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes