SEJun 16, 2021

Cross-Language Code Search using Static and Dynamic Analyses

arXiv:2106.09173v146 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for more accurate and practical multi-language code search tools for software developers, though it is incremental as it builds on existing static and dynamic analysis methods.

The paper tackles the problem of low precision in cross-language code-to-code search by introducing COSAL, a technique that combines static and dynamic analyses without requiring machine learning models, and shows it achieves better precision and recall compared to state-of-the-art tools on datasets of over 43,000 files.

As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical,multi-language code-to-code search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes