Hyperparameter Optimization for AST Differencing
This work provides an incremental improvement for software developers and researchers who rely on AST differencing for tasks like code evolution analysis.
This paper addresses the problem of hyperparameter optimization for AST differencing algorithms. The authors propose DAT, a data-driven approach that improves the edit-scripts generated by the state-of-the-art GumTree algorithm in 21.8% of evaluated cases.
Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST differencing algorithms rely on configuration parameters that may have a strong impact on their effectiveness. In this paper, we present a novel approach named DAT (Diff Auto Tuning) for hyperparameter optimization of AST differencing. We thoroughly state the problem of hyper-configuration for AST differencing. We evaluate our data-driven approach DAT to optimize the edit-scripts generated by the state-of-the-art AST differencing algorithm named GumTree in different scenarios. DAT is able to find a new configuration for GumTree that improves the edit-scripts in 21.8% of the evaluated cases.