Nyyti Saarimäki

h-index9

5papers

492citations

Novelty20%

AI Score22

Ranked #177,959 of 194,257 authors (top 92%)#2,235 in SE (top 74%)

5 Papers

17.2SEAug 25, 2019Code

Does Code Quality Affect Pull Request Acceptance? An empirical study

Valentina Lenarduzzi, Vili Nikkola, Nyyti Saarimäki et al.

Background. Pull requests are a common practice for contributing and reviewing contributions, and are employed both in open-source and industrial contexts. One of the main goals of code reviews is to find defects in the code, allowing project maintainers to easily integrate external contributions into a project and discuss the code contributions. Objective. The goal of this paper is to understand whether code quality is actually considered when pull requests are accepted. Specifically, we aim at understanding whether code quality issues such as code smells, antipatterns, and coding style violations in the pull request code affect the chance of its acceptance when reviewed by a maintainer of the project. Method. We conducted a case study among 28 Java open-source projects, analyzing the presence of 4.7 M code quality issues in 36 K pull requests. We analyzed further correlations by applying Logistic Regression and seven machine learning techniques (Decision Tree, Random Forest, Extremely Randomized Trees, AdaBoost, Gradient Boosting, XGBoost). Results. Unexpectedly, code quality turned out not to affect the acceptance of a pull request at all. As suggested by other works, other factors such as the reputation of the maintainer and the importance of the feature delivered might be more important than code quality in terms of pull request acceptance. Conclusions. Researchers already investigated the influence of the developers' reputation and the pull request acceptance. This is the first work investigating if quality of the code in pull requests affects the acceptance of the pull request or not. We recommend that researchers further investigate this topic to understand if different measures or different tools could provide some useful measures.

5.5SEJun 25

Cleaning Logs for Downstream Tasks (Registered Report)

Zahra G. Yazdi, Van-Hoang Le, Nyyti Saarimäki et al.

Background: Software systems generate logs during execution to record critical events and runtime information for troubleshooting and monitoring. However, in practice, logs often contain significant amounts of redundant and irrelevant information, which can negatively impact the performance of downstream analysis tasks, such as model inference and anomaly detection. Objective: The objective of this study is to clean log data by identifying and removing free-standing messages -- messages that are not relevant to the execution behaviors of interest and are interleaved with messages capturing the system's functional behavior. Method: To address this objective, we propose LogPurifier, a task-agnostic log-cleaning approach based on dependency relationships between log message templates. The paper presents a plan for an empirical evaluation using a controlled experimental design to assess the impact of LogPurifier on the effectiveness and efficiency of two downstream tasks: model inference and anomaly detection.

17.6SEJan 21, 2021

A Critical Comparison on Six Static Analysis Tools: Detection, Agreement, and Precision

Valentina Lenarduzzi, Savanna Lujan, Nyyti Saarimaki et al.

Background. Developers use Automated Static Analysis Tools (ASATs) to control for potential quality issues in source code, including defects and technical debt. Tool vendors have devised quite a number of tools, which makes it harder for practitioners to select the most suitable one for their needs. To better support developers, researchers have been conducting several studies on ASATs to favor the understanding of their actual capabilities. Aims. Despite the work done so far, there is still a lack of knowledge regarding (1) which source quality problems can actually be detected by static analysis tool warnings, (2) what is their agreement, and (3) what is the precision of their recommendations. We aim at bridging this gap by proposing a large-scale comparison of six popular static analysis tools for Java projects: Better Code Hub, CheckStyle, Coverity Scan, Findbugs, PMD, and SonarQube. Method. We analyze 47 Java projects and derive a taxonomy of warnings raised by 6 state-of-the-practice ASATs. To assess their agreement, we compared them by manually analyzing - at line-level - whether they identify the same issues. Finally, we manually evaluate the precision of the tools. Results. The key results report a comprehensive taxonomy of ASATs warnings, show little to no agreement among the tools and a low degree of precision. Conclusions. We provide a taxonomy that can be useful to researchers, practitioners, and tool vendors to map the current capabilities of the tools. Furthermore, our study provides the first overview on the agreement among different tools as well as an extensive analysis of their precision.

30.5SEOct 7, 2020Code

Empirical Standards for Software Engineering Research

Paul Ralph, Nauman bin Ali, Sebastian Baltes et al.

Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.

6.9SEAug 5, 2019

An Empirical Study on Technical Debt in a Finnish SME

Valentina Lenarduzzi, Teemu Orava, Nyyti Saarimäki et al.

Objective. In this work, we report the experience of a Finnish SME in managing Technical Debt (TD), investigating the most common types of TD they faced in the past, their causes, and their effects. Method. We set up a focus group in the case-company, involving different roles. Results. The results showed that the most significant TD in the company stems from disagreements with the supplier and lack of test automation. Specification and test TD are the most significant types of TD. Budget and time constraints were identified as the most important root causes of TD. Conclusion. TD occurs when time or budget is limited or the amount of work are not understood properly. However, not all postponed activities generated "debt". Sometimes the accumulation of TD helped meet deadlines without a major impact, while in other cases the cost for repaying the TD was much higher than the benefits. From this study, we learned that learning, careful estimations, and continuous improvement could be good strategies to mitigate TD. These strategies include iterative validation with customers, efficient communication with stakeholders, meta-cognition in estimations, and value orientation in budgeting and scheduling.