SEAug 27, 2021

An Experimental Analysis of Graph-Distance Algorithms for Comparing API Usages

Sebastian Nielebock, Paul Blockhaus, Jacob Krüger, Frank Ortmeier

arXiv:2108.12511v13.6

Originality Synthesis-oriented

AI Analysis

This addresses the challenge for software developers in preventing API misuse-related bugs, but it is incremental as it analyzes existing methods without proposing a new solution.

The paper tackled the problem of automatically identifying API misuses by comparing graph-based representations of API usages, finding that existing graph-distance algorithms are unreliable for this task. The results were based on applying eight algorithms to two real-world datasets, highlighting issues in correctness and runtime.

Modern software development heavily relies on the reuse of functionalities through Application Programming Interfaces (APIs). However, client developers can have issues identifying the correct usage of a certain API, causing misuses accompanied by software crashes or usability bugs. Therefore, researchers have aimed at identifying API misuses automatically by comparing client code usages to correct API usages. Some techniques rely on certain API-specific graph-based data structures to improve the abstract representation of API usages. Such techniques need to compare graphs, for instance, by computing distance metrics based on the minimal graph edit distance or the largest common subgraphs, whose computations are known to be NP-hard problems. Fortunately, there exist many abstractions for simplifying graph distance computation. However, their applicability for comparing graph representations of API usages has not been analyzed. In this paper, we provide a comparison of different distance algorithms of API-usage graphs regarding correctness and runtime. Particularly, correctness relates to the algorithms' ability to identify similar correct API usages, but also to discriminate similar correct and false usages as well as non-similar usages. For this purpose, we systematically identified a set of eight graph-based distance algorithms and applied them on two datasets of real-world API usages and misuses. Interestingly, our results suggest that existing distance algorithms are not reliable for comparing API usage graphs. To improve on this situation, we identified and discuss the algorithms' issues, based on which we formulate hypotheses to initiate research on overcoming them.

View on arXiv PDF

Similar