SEJul 24, 2024Code
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example LearningPartha Chakraborty, Mahmoud Alfadel, Meiyappan Nagappan
Software bugs require developers to exert significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is crucial in reducing this effort. Existing bug localization tools, typically reliant on deep learning techniques, face limitations in cross-project applicability and effectiveness in multi-language environments. Recent advancements with Large Language Models (LLMs) offer detailed representations for bug localization. However, they encounter challenges with limited context windows and mapping accuracy. To address these issues, we propose BLAZE, an approach that employs dynamic chunking and hard example learning. First, BLAZE dynamically segments source code to minimize continuity loss. Then, BLAZE fine-tunes a GPT-based model using challenging bug cases, in order to enhance cross-project and cross-language bug localization. To support the capability of BLAZE, we create the BEETLEBOX dataset, which comprises 26,321 bugs from 29 large and thriving open-source projects across five different programming languages (Java, C++, Python, Go, and JavaScript). Our evaluations of BLAZE on three benchmark datasets BEETLEBOX, SWE-Bench, and Ye et al. demonstrate substantial improvements compared to six state-of-the-art baselines. Specifically, BLAZE achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR). An extensive ablation study confirms the contributions of our pipeline components to the overall performance enhancement.
SEJul 3, 2024
Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic DatasetsPartha Chakraborty, Krishna Kanth Arumugam, Mahmoud Alfadel et al.
The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of up to 99%, but these models underperform in practical scenarios, particularly when assessed on entire codebases rather than just the fixing commit. This paper introduces Real-Vul, a comprehensive dataset representing real-world scenarios for evaluating vulnerability detection models. Evaluating DeepWukong, LineVul, ReVeal, and IVDetect shows a significant drop in performance, with precision decreasing by up to 95 percentage points and F1 scores by up to 91 points. Furthermore, Model performance fluctuates based on vulnerability characteristics, with better F1 scores for information leaks or code injection than for path resolution or predictable return values. The results highlight a significant performance gap that needs addressing before deploying deep learning-based vulnerability detection in practical settings. Overfitting is identified as a key issue, and an augmentation technique is proposed, potentially improving performance by up to 30%. Contributions include a dataset creation approach for better model evaluation, Real-Vul dataset, and empirical evidence of deep learning models struggling in real-world settings.
SEJun 25, 2024
Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug LocalizationPartha Chakraborty, Venkatraman Arumugam, Meiyappan Nagappan
Bug localization refers to the identification of source code files which is in a programming language and also responsible for the unexpected behavior of software using the bug report, which is a natural language. As bug localization is labor-intensive, bug localization models are employed to assist software developers. Due to the domain difference between source code files and bug reports, modern bug-localization systems, based on deep learning models, rely heavily on embedding techniques that project bug reports and source code files into a shared vector space. The creation of an embedding involves several design choices, but the impact of these choices on the quality of embedding and the performance of bug localization models remains unexplained in current research. To address this gap, our study evaluated 14 distinct embedding models to gain insights into the effects of various design choices. Subsequently, we developed bug localization models utilizing these embedding models to assess the influence of these choices on the performance of the localization models. Our findings indicate that the pre-training strategies significantly affect the quality of the embedding. Moreover, we discovered that the familiarity of the embedding models with the data has a notable impact on the bug localization model's performance. Notably, when the training and testing data are collected from different projects, the performance of the bug localization models exhibits substantial fluctuations.
SEMay 9, 2023
RLocator: Reinforcement Learning for Bug LocalizationPartha Chakraborty, Mahmoud Alfadel, Meiyappan Nagappan
Software developers spend a significant portion of time fixing bugs in their projects. To streamline this process, bug localization approaches have been proposed to identify the source code files that are likely responsible for a particular bug. Prior work proposed several similarity-based machine-learning techniques for bug localization. Despite significant advances in these techniques, they do not directly optimize the evaluation measures. We argue that directly optimizing evaluation measures can positively contribute to the performance of bug localization approaches. Therefore, In this paper, we utilize Reinforcement Learning (RL) techniques to directly optimize the ranking metrics. We propose RLocator, a Reinforcement Learning-based bug localization approach. We formulate RLocator using a Markov Decision Process (MDP) to optimize the evaluation measures directly. We present the technique and experimentally evaluate it based on a benchmark dataset of 8,316 bug reports from six highly popular Apache projects. The results of our evaluation reveal that RLocator achieves a Mean Reciprocal Rank (MRR) of 0.62, a Mean Average Precision (MAP) of 0.59, and a Top 1 score of 0.46. We compare RLocator with two state-of-the-art bug localization tools, FLIM and BugLocator. Our evaluation reveals that RLocator outperforms both approaches by a substantial margin, with improvements of 38.3% in MAP, 36.73% in MRR, and 23.68% in the Top K metric. These findings highlight that directly optimizing evaluation measures considerably contributes to performance improvement of the bug localization problem.
SEMay 4, 2021
How do developers discuss and support new programming languages in technical Q&A site? An empirical study of Go, Swift, and Rust in Stack OverflowPartha Chakraborty, Rifat Shahriyar, Anindya Iqbal et al.
New programming languages (e.g., Swift, Go, Rust, etc.) are being introduced to provide a better opportunity for the developers to make software development robust and easy. At the early stage, a programming language is likely to have resource constraints that encourage the developers to seek help frequently from experienced peers active in QA sites such as Stack Overflow (SO). In this study, we have formally studied the discussions on three popular new languages introduced after the inception of SO (2008) and match those with the relevant activities in GitHub whenever appropriate. For that purpose, we have mined 4,17,82,536 questions and answers from SO and 7,846 issue information along with 6,60,965 repository information from GitHub. Initially, the development of new languages is relatively slow compared to mature languages (e.g., C, C++, Java). The expected outcome of this study is to reveal the difficulties and challenges faced by the developers working with these languages so that appropriate measures can be taken to expedite the generation of relevant resources. We have used the LDA method on SO's questions and answers to identify different topics of new languages. We have extracted several features of the answer pattern of the new languages from SO to study their characteristics. These attributes were used to identify difficult topics. We explored the background of developers who are contributing to these languages. We have created a model by combining Stack Overflow data and issues, repository, user data of GitHub. Finally, we have used that model to identify factors that affect language evolution. We believe that the outcome of our study is likely to help the owner/sponsor of these languages to design better features and documentation. It will also help the software developers or students to prepare themselves to work on these languages in an informed way.