Julia Lawall

SE
8papers
354citations
Novelty50%
AI Score43

8 Papers

77.4SEApr 3
AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

Yunbo Lyu, Jieke Shi, Hong Jin Kang et al.

The SZZ algorithm is the dominant technique for identifying bug-inducing commits and underpins many software engineering tasks, such as defect prediction and vulnerability analysis. Despite numerous variants, including recent LLM-based approaches, performance remains limited on developer-annotated datasets (e.g., recall of 0.552 on the Linux kernel). A key limitation is the reliance on git blame, which traces line-level changes within the same file, failing in common scenarios such as ghost and cross-file cases-making nearly one-quarter of bug-inducing commits inherently untraceable. Moreover, current approaches follow fixed pipelines that restrict iterative reasoning and exploration, unlike developers who investigate bugs through an interactive, multi-tool process. To address these challenges, we propose AgentSZZ, an agent-based framework that leverages LLM-driven agents to explore repositories and identify bug-inducing commits. Unlike prior methods, AgentSZZ integrates task-specific tools, domain knowledge, and a ReAct-style loop to enable adaptive and causal tracing of bugs. A structured compression module further improves efficiency by reducing redundant context while preserving key evidence. Extensive experiments on three widely used datasets show that AgentSZZ consistently outperforms state-of-the-art SZZ algorithms across all settings, achieving F1-score gains of up to 27.2% over prior LLM-based approaches. The improvements are especially pronounced in challenging scenarios such as cross-file and ghost commits, with recall gains of up to 300% and 60%, respectively. Ablation studies show that task-specific tools and domain knowledge are critical, while compression tool outputs reduce token consumption by over 30% with negligible impact. The replication package is available.

SEFeb 16, 2019Code
PatchNet: A Tool for Deep Patch Classification

Thong Hoang, Julia Lawall, Richard J. Oentaryo et al.

This work proposes PatchNet, an automated tool based on hierarchical deep learning for classifying patches by extracting features from commit messages and code changes. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of a code change, differentiating it from the existing deep learning models on source code. PatchNet provides several options allowing users to select parameters for the training process. The tool has been validated in the context of automatic identification of stable-relevant patches in the Linux kernel and is potentially applicable to automate other software engineering tasks that can be formulated as patch classification problems. A video demonstrating PatchNet is available at https://goo.gl/CZjG6X. The PatchNet implementation is available at https://github.com/hvdthong/PatchNetTool.

SEDec 14, 2020
AndroEvolve: Automated Update for Android Deprecated-API Usages

Stefanus Agus Haryono, Ferdian Thung, David Lo et al.

Android operating system (OS) is often updated, where each new version may involve API deprecation. Usages of deprecated APIs in Android apps need to be updated to ensure the apps' compatibility with the old and new versions of Android OS. In this work, we propose AndroEvolve, an automated tool to update usages of deprecated Android APIs, that addresses the limitations of the state-of-the-art tool, CocciEvolve. AndroEvolve utilizes data flow analysis to solve the problem of out-of-method-boundary variables, and variable denormalization to remove the temporary variables introduced by CocciEvolve. We evaluated the accuracy of AndroEvolve using a dataset of 360 target files and 20 deprecated Android APIs, where AndroEvolve is able to produce 319 correct updates, compared to CocciEvolve which only produces 249 correct updates. We also evaluated the readability of AndroEvolve's update results using a manual and an automatic evaluation. Both evaluations demonstrated that the code produced by AndroEvolve has higher readability than CocciEvolve's. A video demonstration of AndroEvolve is available at https://youtu.be/siU0tuMITXI.

SENov 10, 2020
AndroEvolve: Automated Android API Update with Data Flow Analysis and Variable Denormalization

Stefanus A. Haryono, Ferdian Thung, David Lo et al.

The Android operating system is frequently updated, with each version bringing a new set of APIs. New versions may involve API deprecation; Android apps using deprecated APIs need to be updated to ensure the apps' compatibility withold and new versions of Android. Updating deprecated APIs is a time-consuming endeavor. Hence, automating the updates of Android APIs can be beneficial for developers. CocciEvolve is the state-of-the-art approach for this automation. However, it has several limitations, including its inability to resolve out-of-method-boundary variables and the low code readability of its update due to the addition of temporary variables. In an attempt to further improve the performance of automated Android API update, we propose an approach named AndroEvolve, which addresses the limitations of CocciEvolve through the addition of data flow analysis and variable name denormalization. Data flow analysis enables AndroEvolve to resolve the value of any variable within the file scope. Variable name denormalization replaces temporary variables that may present in the CocciEvolve update with appropriate values in the target file. We have evaluated the performance of AndroEvolve and the readability of its updates on 360 target files. AndroEvolve produces 26.90% more instances of correct updates compared to CocciEvolve. Moreover, our manual and automated evaluation shows that AndroEvolve updates are more readable than CocciEvolve updates.

SENov 10, 2020
Characterization and Automatic Update of Deprecated Machine-Learning API Usages

Stefanus Agus Haryono, Ferdian Thung, David Lo et al.

Due to the rise of AI applications, machine learning libraries have become far more accessible, with Python being the most common programming language to write them. Machine learning libraries tend to be updated periodically, which may deprecate existing APIs, making it necessary for developers to update their usages. However, updating usages of deprecated APIs are typically not a priority for developers, leading to widespread usages of deprecated APIs which expose library users to vulnerability issues. In this paper, we built a tool to automate these updates. We first conducted an empirical study to seek a better understanding on how updates of deprecated machine-learning API usages in Python can be done. The study involved a dataset of 112 deprecated APIs from Scikit-Learn, TensorFlow, and PyTorch. We found dimensions of deprecated API migration related to its update operation (i.e., the required operation to perform the migration), API mapping (i.e., the number of deprecated and its corresponding updated APIs),and context dependency (i.e., whether we need to consider surrounding contexts when performing the migration). Guided by the findings on our empirical study, we created MLCatchUp, a tool to automate the update of Python deprecated API usage that automatically infers the API migration transformation through comparison of the deprecated and updated API signatures. These transformations are expressed in a Domain Specific Language (DSL). We evaluated MLCatchUp using test dataset containing 258 files with 514 API usages that we collected from public GitHub repositories. In this evaluation, MLCatchUp achieves a precision of 86.19%. We further improve the precision of MLCatchUp by adding a feature that allows it to accept additional user input to specify the transformation constraints in the DSL for context-dependent API migration, where MLCatchUp achieves a precision of 93.58%.

SEMay 27, 2020
Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example

Stefanus Agus Haryono, Ferdian Thung, Hong Jin Kang et al.

Due to the deprecation of APIs in the Android operating system,developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android.Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating such updates, relies on having before- and after-update examples to learn from. In this work, we propose an approach named CocciEvolve that performs such updates using only a single after-update example. CocciEvolve learns edits by extracting the relevant update to a block of code from an after-update example. From preliminary experiments, we find that CocciEvolve can successfully perform 96 out of 112 updates, with a success rate of 85%.

SEMar 12, 2020
CC2Vec: Distributed Representations of Code Changes

Thong Hoang, Hong Jin Kang, Julia Lawall et al.

Existing work on software patches often use features specific to a single task. These works often rely on manually identified features, and human effort is required to identify these features for each task. In this work, we propose CC2Vec, a neural network model that learns a representation of code changes guided by their accompanying log messages, which represent the semantic intent of the code changes. CC2Vec models the hierarchical structure of a code change with the help of the attention mechanism and uses multiple comparison functions to identify the differences between the removed and added code. To evaluate if CC2Vec can produce a distributed representation of code changes that is general and useful for multiple tasks on software patches, we use the vectors produced by CC2Vec for three tasks: log message generation, bug fixing patch identification, and just-in-time defect prediction. In all tasks, the models using CC2Vec outperform the state-of-the-art techniques.

SENov 8, 2019
PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

Thong Hoang, Julia Lawall, Yuan Tian et al.

Linux kernel stable versions serve the needs of users who value stability of the kernel over new features. The quality of such stable versions depends on the initiative of kernel developers and maintainers to propagate bug fixing patches to the stable versions. Thus, it is desirable to consider to what extent this process can be automated. A previous approach relies on words from commit messages and a small set of manually constructed code features. This approach, however, shows only moderate accuracy. In this paper, we investigate whether deep learning can provide a more accurate solution. We propose PatchNet, a hierarchical deep learning-based approach capable of automatically extracting features from commit messages and commit code and using them to identify stable patches. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of commit code, making it distinctive from the existing deep learning models on source code. Experiments on 82,403 recent Linux patches confirm the superiority of PatchNet against various state-of-the-art baselines, including the one recently-adopted by Linux kernel maintainers.