Alireza Aghamohammadi

SE
4papers
59citations
Novelty30%
AI Score18

4 Papers

SENov 12, 2020
A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Steffen Herbold, Alexander Trautsch, Benjamin Ledel et al.

Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.

SEMay 23, 2020
The Threat to the Validity of Predictive Mutation Testing: The Impact of Uncovered Mutants

Alireza Aghamohammadi, Seyed-Hassan Mirian-Hosseinabadi

Predictive Mutation Testing (PMT) is a technique to predict whether a mutant will be killed by using machine learning approaches. Researchers have proposed various machine learning methods for PMT under the cross-project setting. However, they did not consider the impact of uncovered mutants. A mutant is uncovered if the statement on which the mutant is generated is not executed by any test cases. We show that uncovered mutants inflate previous PMT results. Moreover, we aim at proposing an alternative approach to improve PMT and suggesting a different interpretation for cross-project PMT. We replicated the previous PMT research. We also proposed an approach based on the combination of Random Forest and Gradient Boosting to improve the PMT results. We empirically evaluated our approach on the same 654 Java projects provided by the previous PMT literature. Our results indicate that the performance of PMT drastically decreases in terms of AUC from 0.83 to 0.51. Furthermore, PMT performs worse than random guesses on 27% of the test projects. The proposed approach improves the PMT results by achieving the average AUC value of 0.61.

SEApr 14, 2020
An Analysis of Python's Topics, Trends, and Technologies Through Mining Stack Overflow Discussions

Hamed Tahmooresi, Abbas Heydarnoori, Alireza Aghamohammadi

Python is a popular, widely used, and general-purpose programming language. In spite of its ever-growing community, researchers have not performed much analysis on Python's topics, trends, and technologies which provides insights for developers about Python community trends and main issues. In this article, we examine the main topics related to this language being discussed by developers on one of the most popular Q\&A websites, Stack Overflow, as well as temporal trends through mining 2461876 posts. To be more useful for the software engineers, we study what Python provides as the alternative to popular technologies offered by common programming languages like Java. Our results indicate that discussions about Python standard features, web programming, and scientific programming. Programming in areas such as mathematics, data science, statistics, machine learning, natural language processing (NLP), and so forth. are the most popular areas in the Python community. At the same time, areas related to scientific programming are steadily receiving more attention from the Python developers.

SEDec 11, 2018
Generating Summaries for Methods of Event-Driven Programs: an Android Case Study

Alireza Aghamohammadi, Maliheh Izadi, Abbas Heydarnoori

The lack of proper documentation makes program comprehension a cumbersome process for developers. Source code summarization is one of the existing solutions to this problem. Lots of approaches have been proposed to summarize source code in recent years. A prevalent weakness of these solutions is that they do not pay much attention to interactions among elements of a software. An element is simply a callable code snippet such as a method or even a clickable button. As a result, these approaches cannot be applied to event-driven programs, such as Android applications, because they have specific features such as numerous interactions between their elements. To tackle this problem, we propose a novel approach based on deep neural networks and dynamic call graphs to generate summaries for methods of event-driven programs. First, we collect a set of comment/code pairs from Github and train a deep neural network on the set. Afterward, by exploiting a dynamic call graph, the Pagerank algorithm, and the pre-trained deep neural network, we generate summaries. An empirical evaluation with 14 real-world Android applications and 42 participants indicates 32.3% BLEU4 which is a definite improvement compared to the existing state-of-the-art techniques. We also assessed the informativeness and naturalness of our generated summaries from developers' perspectives and showed they are sufficiently understandable and informative.