M3: Semantic API Migrations
This addresses the challenge of migrating code to new libraries when changelogs or internal representations are unavailable, which is a domain-specific problem for software developers and maintainers.
The paper tackles the problem of library API migration without prior knowledge of the target library, proposing M3 which uses probabilistic program synthesis to model library functions and SMT-based code search to find migration opportunities. The approach learned correct implementations for 94 functions across 7 libraries and discovered over 2,000 migration opportunities in 9 C/C++ applications with over 1 million lines of code.
Library migration is a challenging problem, where most existing approaches rely on prior knowledge. This can be, for example, information derived from changelogs or statistical models of API usage. This paper addresses a different API migration scenario where there is no prior knowledge of the target library. We have no historical changelogs and no access to its internal representation. To tackle this problem, this paper proposes a novel approach (M$^3$), where probabilistic program synthesis is used to semantically model the behavior of library functions. Then, we use an SMT-based code search engine to discover similar code in user applications. These discovered instances provide potential locations for API migrations. We evaluate our approach against 7 well-known libraries from varied application domains, learning correct implementations for 94 functions. Our approach is integrated with standard compiler tooling, and we use this integration to evaluate migration opportunities in 9 existing C/C++ applications with over 1MLoC. We discover over 7,000 instances of these functions, of which more than 2,000 represent migration opportunities.