Markku Oivo

h-index39

19papers

5,110citations

Novelty20%

AI Score36

Ranked #101,715 of 194,257 authors (top 52%)#1,097 in SE (top 36%)

19 Papers

8.0SEJul 8, 2024

6GSoft: Software for Edge-to-Cloud Continuum

Muhammad Azeem Akbar, Matteo Esposito, Sami Hyrynsalmi et al.

In the era of 6G, developing and managing software requires cutting-edge software engineering (SE) theories and practices tailored for such complexity across a vast number of connected edge devices. Our project aims to lead the development of sustainable methods and energy-efficient orchestration models specifically for edge environments, enhancing architectural support driven by AI for contemporary edge-to-cloud continuum computing. This initiative seeks to position Finland at the forefront of the 6G landscape, focusing on sophisticated edge orchestration and robust software architectures to optimize the performance and scalability of edge networks. Collaborating with leading Finnish universities and companies, the project emphasizes deep industry-academia collaboration and international expertise to address critical challenges in edge orchestration and software architecture, aiming to drive significant advancements in software productivity and market impact.

4.5SEJun 24

Engineering a Governance-Aware AI Sandbox: Design, Implementation, and Lessons Learned

Muhammad Waseem, Md Aidul Islam, Md Nasir Uddin Shuvo et al.

Collaborative AI experimentation in industry-academia requires environments that support rapid trials while maintaining controlled access, organisational isolation, and traceable workflows. Although interest in AI sandboxes is increasing, practical guidance on designing and building governance-aware experimentation platforms remains limited. This work designs and operationalizes a governance-aware, multi-tenant AI sandbox that supports structured experimentation and produces reusable evaluation evidence across stakeholders. The sandbox was developed in an industry-academia ecosystem using iteratively validated requirements gathered from industrial partners. The solution adopts a layered reference architecture that separates a multi-tenant presentation layer from a backend control plane and isolates execution and data management concerns into dedicated layers. The sandbox supports governed onboarding, project-based collaboration, controlled access to AI services, and traceable experimentation through approval workflows and audit logging. By structuring experiment context and governance decisions as persistent records, the sandbox enables evaluation evidence to be reused and compared across projects and stakeholders. The development experience yields lessons learned and practical considerations that inform deployment and future evolution of governance-aware sandbox platforms.

5.9SEMar 18

Requirements Volatility in Software Architecture Design: An Exploratory Case Study

Sanja Aaramaa, Sandun Dasanayake, Markku Oivo et al.

Requirements volatility is a major issue in software (SW) development, causing problems such as project delays and cost overruns. Even though there is a considerable amount of research related to requirement volatility, the majority of it is inclined toward project management aspects. The relationship between SW architecture design and requirements volatility has not been researched widely, even though changing requirements may for example lead to higher defect density during testing. An exploratory case study was conducted to study how requirements volatility affects SW architecture design. Fifteen semi-structured, thematic interviews were conducted in the case company, which provides the selection of software products for business customers and consumers. The research revealed the factors, such as requirements uncertainty and dynamic business environment, causing requirements volatility in the case company. The study identified the challenges that requirements volatility posed to SW architecture design, including scheduling and architectural technical debt. In addition, this study discusses means of mitigating the factors that cause requirements volatility and addressing the challenges posed by requirements volatility. SW architects are strongly influenced by requirement volatility. Thus understanding the factors causing requirements volatility as well as means to mitigate the challenges has high industrial relevance.

14.7SESep 1, 2018Code

Test Prioritization in Continuous Integration Environments

Alireza Haghighatkhah, Mika Mäntylä, Markku Oivo et al.

Two heuristics namely diversity-based (DBTP) and history-based test prioritization (HBTP) have been separately proposed in the literature. Yet, their combination has not been widely studied in continuous integration (CI) environments. The objective of this study is to catch regression faults earlier, allowing developers to integrate and verify their changes more frequently and continuously. To achieve this, we investigated six open-source projects, each of which included several builds over a large time period. Findings indicate that previous failure knowledge seems to have strong predictive power in CI environments and can be used to effectively prioritize tests. HBTP does not necessarily need to have large data, and its effectiveness improves to a certain degree with larger history interval. DBTP can be used effectively during the early stages, when no historical data is available, and also combined with HBTP to improve its effectiveness. Among the investigated techniques, we found that history-based diversity using NCD Multiset is superior in terms of effectiveness but comes with relatively higher overhead in terms of method execution time. Test prioritization in CI environments can be effectively performed with negligible investment using previous failure knowledge, and its effectiveness can be further improved by considering dissimilarities among the tests.

4.9SESep 1, 2018Code

Test Case Prioritization Using Test Similarities

Alireza Haghighatkhah, Mika Mäntylä, Markku Oivo et al.

A classical heuristic in software testing is to reward diversity, which implies that a higher priority must be assigned to test cases that differ the most from those already prioritized. This approach is commonly known as similarity-based test prioritization (SBTP) and can be realized using a variety of techniques. The objective of our study is to investigate whether SBTP is more effective at finding defects than random permutation, as well as determine which SBTP implementations lead to better results. To achieve our objective, we implemented five different techniques from the literature and conducted an experiment using the defects4j dataset, which contains 395 real faults from six real-world open-source Java programs. Findings indicate that running the most dissimilar test cases early in the process is largely more effective than random permutation (Vargha-Delaney A [VDA]: 0.76-0.99 observed using normalized compression distance). No technique was found to be superior with respect to the effectiveness. Locality-sensitive hashing was, to a small extent, less effective than other SBTP techniques (VDA: 0.38 observed in comparison to normalized compression distance), but its speed largely outperformed the other techniques (i.e., it was approximately 5-111 times faster). Our results bring to mind the well-known adage, "don't put all your eggs in one basket". To effectively consume a limited testing budget, one should spread it evenly across different parts of the system by running the most dissimilar test cases early in the testing process.

3.6SEOct 5, 2021

Towards optimal quality requirement documentation in agile software development: a multiple case study

Woubshet Behutiye, Pilar Rodríguez, Markku Oivo et al.

Context-Agile software development (ASD) promotes minimal documentation and often prioritizes functional requirements over quality requirements (QRs). The minimal documentation emphasis may be beneficial in reducing time-to-market for software. However, it can also be a concern, especially with QRs, since they are challenging to specify and document and are crucial for software success. Therefore, understanding how practitioners perceive the importance of QR documentation is valuable because it can provide insight into how they approach this task. It also helps in developing models and guidelines that support the documentation of QRs in ASD, which is a research gap. Objective: We aim to understand practitioners' perceptions of QR documentation and factors influencing this task to derive a model that supports optimal QR documentation in ASD. Method: We conducted a multiple case study involving 12 participants from three cases that apply ASD. Please refer to the document to read the full version of the abstract.

10.4SENov 24, 2020

A Family of Experiments on Test-Driven Development

Adrian Santos, Sira Vegas, Oscar Dieste et al.

Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives: The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method: We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results: TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion: Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process...

7.3SENov 5, 2020Code

Comparing the Results of Replications in Software Engineering

Adrian Santos, Sira Vegas, Markku Oivo et al.

Context: It has been argued that software engineering replications are useful for verifying the results of previous experiments. However, it has not yet been agreed how to check whether the results hold across replications. Besides, some authors suggest that replications that do not verify the results of previous experiments can be used to identify contextual variables causing the discrepancies. Objective: Study how to assess the (dis)similarity of the results of SE replications when they are compared to verify the results of previous experiments and understand how to identify whether contextual variables are influencing results. Method: We run simulations to learn how different ways of comparing replication results behave when verifying the results of previous experiments. We illustrate how to deal with context-induced changes. To do this, we analyze three groups of replications from our own research on test-driven development and testing techniques. Results: The direct comparison of p-values and effect sizes does not appear to be suitable for verifying the results of previous experiments and examining the variables possibly affecting the results in software engineering. Analytical methods such as meta-analysis should be used to assess the similarity of software engineering replication results and identify discrepancies in results. Conclusion: The results achieved in baseline experiments should no longer be regarded as a result that needs to be reproduced, but as a small piece of evidence within a larger picture that only emerges after assembling many small pieces to complete the puzzle.

10.4SEApr 11, 2020

A Procedure and Guidelines for Analyzing Groups of Software Engineering Replications

Adrian Santos, Sira Vegas, Markku Oivo et al.

Context: Researchers from different groups and institutions are collaborating on building groups of experiments by means of replication (i.e., conducting groups of replications). Disparate aggregation techniques are being applied to analyze groups of replications. The application of unsuitable techniques to aggregate replication results may undermine the potential of groups of replications to provide in-depth insights from experiment results. Objectives: Provide an analysis procedure with a set of embedded guidelines to aggregate software engineering (SE) replication results. Method: We compare the characteristics of groups of replications for SE and other mature experimental disciplines such as medicine and pharmacology. In view of their differences, the limitations with regard to the joint data analysis of groups of SE replications and the guidelines provided in mature experimental disciplines to analyze groups of replications, we build an analysis procedure with a set of embedded guidelines specifically tailored to the analysis of groups of SE replications. We apply the proposed analysis procedure to a representative group of SE replications to illustrate its use. Results: All the information contained within the raw data should be leveraged during the aggregation of replication results. The analysis procedure that we propose encourages the use of stratified individual participant data and aggregated data in tandem to analyze groups of SE replications. Conclusion: The aggregation techniques used to analyze groups of replications should be justified in research articles. This will increase the reliability and transparency of joint results. The proposed guidelines should ease this endeavor.

12.8SEFeb 6, 2020

Management of quality requirements in agile and rapid software development: A systematic mapping study

Woubshet Behutiye, Pertti Karhapää, Lidia Lopez et al.

Context:Quality requirements (QRs) describe the desired quality of software, and they play an important role in the success of software projects. In agile software development (ASD), QRs are often ill-defined and not well addressed due to the focus on quickly delivering functionality. Rapid software development (RSD) approaches (e.g., continuous delivery and continuous deployment), which shorten delivery times, are more prone to neglect QRs. Despite the significance of QRs in both ASD and RSD, there is limited synthesized knowledge on their management in those approaches. Objective:This study aims to synthesize state-of-the-art knowledge about QR management in ASD and RSD, focusing on three aspects: bibliometric, strategies, and challenges. Research method:Using a systematic mapping study with a snowballing search strategy, we identified and structured the literature on QR management in ASD and RSD. Check the PDF file to see the full abstract and document.

5.0SEOct 22, 2019

Kuksa: A Cloud-Native Architecture for Enabling Continuous Delivery in the Automotive Domain

Ahmad Banijamali, Pooyan Jamshidi, Pasi Kuvaja et al.

Connecting vehicles to cloud platforms has enabled innovative business scenarios while raising new quality concerns, such as reliability and scalability, which must be addressed by research. Cloud-native architectures based on microservices are a recent approach to enable continuous delivery and to improve service reliability and scalability. We propose an approach for restructuring cloud platform architectures in the automotive domain into a microservices architecture. To this end, we adopted and implemented microservices patterns from literature to design the cloud-native automotive architecture and conducted a laboratory experiment to evaluate the reliability and scalability of microservices in the context of a real-world project in the automotive domain called Eclipse Kuksa. Findings indicated that the proposed architecture could handle the continuous software delivery over-the-air by sending automatic control messages to a vehicular setting. Different patterns enabled us to make changes or interrupt services without extending the impact to others. The results of this study provide evidences that microservices are a potential design solution when dealing with service failures and high payload on cloud-based services in the automotive domain.

4.9SESep 6, 2018

Improving Development Practices through Experimentation: an Industrial TDD Case

Adrian Santos, Jaroslav Spisak, Markku Oivo et al.

Test-Driven Development (TDD), an agile development approach that enforces the construction of software systems by means of successive micro-iterative testing coding cycles, has been widely claimed to increase external software quality. In view of this, some managers at Paf-a Nordic gaming entertainment company-were interested in knowing how would TDD perform at their premises. Eventually, if TDD outperformed their traditional way of coding (i.e., YW, short for Your Way), it would be possible to switch to TDD considering the empirical evidence achieved at the company level. We conduct an experiment at Paf to evaluate the performance of TDD, YW and the reverse approach of TDD (i.e., ITL, short for Iterative-Test Last) on external quality. TDD outperforms YW and ITL at Paf. Despite the encouraging results, we cannot recommend Paf to immediately adopt TDD as the difference in performance between YW and TDD is small. However, as TDD looks promising at Paf, we suggest to move some developers to TDD and to run a future experiment to compare the performance of TDD and YW. TDD slightly outperforms ITL in controlled experiments for TDD novices. However, more industrial experiments are still needed to evaluate the performance of TDD in real-life contexts.

2.7SESep 4, 2018

Software Process Measurement and Related Challenges in Agile Software Development: A Multiple Case Study

Prabhat Ram, Pilar Rodriguez, Markku Oivo

Existing scientific literature highlights the importance of metrics in Agile Software Development (ASD). Still, empirical investigation into metrics in ASD is scarce, particularly in identifying the rationale and the operational challenges associated with metrics. Under the Q-Rapids project (Horizon 2020), we conducted a multiple case study at four Agile companies, using the Goal Question Metric (GQM) approach, to investigate the rationale explaining the choice of process metrics in ASD, and challenges faced in operationalizing them. Results reflect that companies are interested in assessing process aspects like velocity, testing performance, and estimation accuracy, and they prefer custom metrics for these assessments. Companies use metrics as a means to access and even capitalize on the data, erstwhile inaccessible due to technical or process constraints. However, development context of a company can hinder metrics operationalization, manifesting primarily as unavailability of the data required to measure metrics...(check the paper for full Abstract)

2.7SEJul 18, 2018

Moving Beyond the Mean: Analyzing Variance in Software Engineering Experiments

Adrian Santos, Markku Oivo, Natalia Juristo

Software Engineering (SE) experiments are traditionally analyzed with statistical tests (e.g., t-tests, ANOVAs, etc.) that assume equally spread data across treatments (i.e., the homogeneity of variances assumption). Differences across treatments' variances in SE are not seen as an opportunity to gain insights on technology performance, but instead, as a hindrance to analyze the data. We have studied the role of variance in mature experimental disciplines such as medicine. We illustrate the extent to which variance may inform on technology performance by means of simulation. We analyze a real-life industrial experiment on Test-Driven Development (TDD) where variance may impact technology desirability. Evaluating the performance of technologies just based on means-as traditionally done in SE-may be misleading. Technologies that make developers resemble more to each other (i.e., technologies with smaller variances) may be more suitable if the aim is minimizing the risk of adopting them in real practice.

6.7SEJul 18, 2018

Does the performance of TDD hold across software companies and premises? A group of industrial experiments on TDD

Adrian Santos, Janne Jarvinen, Jari Partanen et al.

Test-Driven Development (TDD) has been claimed to increase external software quality. However, the extent to which TDD increases external quality has been seldom studied in industrial experiments. We conduct four industrial experiments in two different companies to evaluate the performance of TDD on external quality. We study whether the performance of TDD holds across premises within the same company and across companies. We identify participant-level characteristics impacting results. Iterative-Test Last (ITL), the reverse approach of TDD, outperforms TDD in three out of four premises. ITL outperforms TDD in both companies. The larger the experience with unit testing and testing tools, the larger the difference in performance between ITL and TDD (in favour of ITL). Technological environment (i.e., programming language and testing tool) seems not to impact results. Evaluating participant-level characteristics impacting results in industrial experiments may ease the understanding of the performance of TDD in realistic settings.

5.2SEAug 30, 2017

Choreography in the embedded systems domain: A systematic literature review

Nebojša Taušan, Jouni Markkula, Pasi Kuvaja et al.

Software companies that develop their products on a basis of service-oriented architecture (SOA) can expect various improvements as a result of choreography. Current choreography practices, however, are not yet used extensively in the embedded systems domain even though SOA is increasingly used in this domain. The objective of this study is to identify current features of the use of choreography in the embedded systems domain for practitioners and researchers by systematically analysing current developments in the scientific literature, strategies for choreography adaption, choreography specification and execution types, and implicit assumptions about choreography. To fulfil this objective, a systematic literature review of scientific publications that focus on the use of choreography in the embedded systems domain was carried out. After screening, 48 publications were selected as primary studies and analysed using thematic synthesis. The main results of the study showed that there are differences in how choreography is used in embedded and non-embedded systems domain. In the embedded systems domain, it is used to capture the service interactions of a single organisation, while, for example, in the enterprise systems domain it captures the service interactions among multiple organisations. Additionally, the results indicate that the use of choreography can lead to improvements in system performance and that the languages that are used for choreography modelling in the embedded systems domain are insufficiently expressive to capture the complexities that are typical in this domain. The study results facilitate the work of practitioners by allowing them to make informed decisions about the applicability of choreography in their organisations.

28.3SENov 27, 2016

Naming the Pain in Requirements Engineering: Contemporary Problems, Causes, and Effects in Practice

D. Méndez Fernández, S. Wagner, M. Kalinowski et al.

Requirements Engineering (RE) has received much attention in research and practice due to its importance to software project success. Its interdisciplinary nature, the dependency to the customer, and its inherent uncertainty still render the discipline difficult to investigate. This results in a lack of empirical data. These are necessary, however, to demonstrate which practically relevant RE problems exist and to what extent they matter. Motivated by this situation, we initiated the Naming the Pain in Requirements Engineering (NaPiRE) initiative which constitutes a globally distributed, bi-yearly replicated family of surveys on the status quo and problems in practical RE. In this article, we report on the qualitative analysis of data obtained from 228 companies working in 10 countries in various domains and we reveal which contemporary problems practitioners encounter. To this end, we analyse 21 problems derived from the literature with respect to their relevance and criticality in dependency to their context, and we complement this picture with a cause-effect analysis showing the causes and effects surrounding the most critical problems. Our results give us a better understanding of which problems exist and how they manifest themselves in practical environments. Thus, we provide a first step to ground contributions to RE on empirical observations which, until now, were dominated by conventional wisdom only.

19.0SENov 18, 2016

A Dissection of the Test-Driven Development Process: Does It Really Matter to Test-First or to Test-Last?

Davide Fucci, Hakan Erdogmus, Burak Turhan et al.

Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.

12.5SEOct 28, 2016

Software Architecture Decision-Making Practices and Challenges: An Industrial Case Study

Sandun Dasanayake, Jouni Markkula, Sanja Aaramaa et al.

Software architecture decision-making is critical to the success of a software system as software architecture sets the structure of the system, determines its qualities, and has far-reaching consequences throughout the system life cycle. The complex nature of the software development context and the importance of the problem has led the research community to develop several techniques, tools, and processes to assist software architects in making better decisions. Despite these effort, the adoption of such systematic approaches appears to be quite limited in practice. In addition, the practitioners are also facing new challenges as different software development methods suggest different approaches for architecture design. In this paper, we study the current software architecture decision-making practices in the industry using a case study conducted among professional software architects in three different companies in Europe. As a result, we identified different software architecture decision-making practices followed by the software teams as well as their reasons for following them, the challenges associated with them, and the possible improvements from the software architects' point of view. Based on that, we recognized that improving software architecture knowledge management can address most of the identified challenges and would result in better software architecture decision-making.