Pekka Abrahamsson

SE
h-index67
73papers
3,389citations
Novelty21%
AI Score50

73 Papers

67.5SEMay 19
CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology

Zeeshan Rasheed, Muhammad Waseem, Kai-Kristian Kemell et al.

Context: LLM-based multi-agent systems enable automation and decision support in software development, yet existing studies rely on benchmark datasets offering only binary pass-or-fail results, limiting insight into real-world applicability. Objective: This study empirically investigates the potential and limitations of LLM-based agents in autonomous software development tasks. Method: A two-phase approach was employed: developing a multi-agent system, CodePori, for automated code generation, and conducting participant-based evaluation to assess practical performance. Results: Participant feedback reveals key strengths, challenges, and areas for improvement in LLM-based multi-agent systems, highlighting aspects missed by standard code-generation benchmarks. Conclusions: While LLM-based multi-agent systems show potential for large-scale software development, successful integration requires addressing challenges such as memory limitations, hallucinations, and code smells, alongside a practitioner-centric perspective.

SEFeb 23
Carbon-Aware Governance Gates: An Architecture for Sustainable GenAI Development

Mateen A. Abbasi, Tommi J. Mikkonen, Petri J. Ihantola et al.

The rapid adoption of Generative AI (GenAI) in the software development life cycle (SDLC) increases computational demand, which can raise the carbon footprint of development activities. At the same time, organizations are increasingly embedding governance mechanisms into GenAI-assisted development to support trust, transparency, and accountability. However, these governance mechanisms introduce additional computational workloads, including repeated inference, regeneration cycles, and expanded validation pipelines, increasing energy use and the carbon footprint of GenAI-assisted development. This paper proposes Carbon-Aware Governance Gates (CAGG), an architectural extension that embeds carbon budgets, energy provenance, and sustainability-aware validation orchestration into human-AI governance layers. CAGG comprises three components: (i) an Energy and Carbon Provenance Ledger, (ii) a Carbon Budget Manager, and (iii) a Green Validation Orchestrator, operationalized through governance policies and reusable design patterns.

62.2AIApr 17
Agentic Frameworks for Reasoning Tasks: An Empirical Study

Zeeshan Rasheed, Abdul Malik Sami, Muhammad Waseem et al.

Recent advances in agentic frameworks have enabled AI agents to perform complex reasoning and decision-making. However, evidence comparing their reasoning performance, efficiency, and practical suitability remains limited. To address this gap, we empirically evaluate 22 widely used agentic frameworks across three reasoning benchmarks: BBH, GSM8K, and ARC. The frameworks were selected from 1,200 GitHub repositories collected between January 2023 and July 2025 and organized into a taxonomy based on architectural design. We evaluated them under a unified setting, measuring reasoning accuracy, execution time, computational cost, and cross-benchmark consistency. Our results show that 19 of the 22 frameworks completed all three benchmarks. Among these, 12 showed stable performance, with mean accuracy of 74.6-75.9%, execution time of 4-6 seconds per task, and cost of 0.14-0.18 cents per task. Poorer results were mainly caused by orchestration problems rather than reasoning limits. For example, Camel failed to complete BBH after 11 days because of uncontrolled context growth, while Upsonic consumed USD 1,434 in one day because repeated extraction failures triggered costly retries. AutoGen and Mastra also exhausted API quotas through iterative interactions that increased prompt length without improving results. We also found a sharp drop in mathematical reasoning. Mean accuracy on GSM8K was 44.35%, compared with 89.80% on BBH and 89.56% on ARC. Overall, this study provides the first large-scale empirical comparison of agentic frameworks for reasoning-intensive software engineering tasks and shows that framework selection should prioritize orchestration quality, especially memory control, failure handling, and cost management.

SEOct 21, 2024Code
Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report

Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell et al.

This paper presents an experience report on the development of Retrieval Augmented Generation (RAG) systems using PDF documents as the primary data source. The RAG architecture combines generative capabilities of Large Language Models (LLMs) with the precision of information retrieval. This approach has the potential to redefine how we interact with and augment both structured and unstructured knowledge in generative models to enhance transparency, accuracy, and contextuality of responses. The paper details the end-to-end pipeline, from data collection, preprocessing, to retrieval indexing and response generation, highlighting technical challenges and practical solutions. We aim to offer insights to researchers and practitioners developing similar systems using two distinct approaches: OpenAI's Assistant API with GPT Series and Llama's open-source models. The practical implications of this research lie in enhancing the reliability of generative AI systems in various sectors where domain-specific knowledge and real-time information retrieval is important. The Python code used in this work is also available at: https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs.

48.2SEMar 10
Context Before Code: An Experience Report on Vibe Coding in Practice

Md Nasir Uddin Shuvo, Md Aidul Islam, Md Mahade Hasan et al.

Code-generating tools are increasingly used in software development, yet experience reports on conversational "vibe coding" under production constraints remain limited. This paper presents an experience report from a small full-stack team that applied contextual prompting and explicit architectural constraints to build (i) a multi-project agent learning platform designed for sustained, production-oriented use and (ii) an academic retrieval-augmented generation system. The agent platform supports multiple isolated projects, each with structured memory and background processing, thereby enforcing project-level isolation. The RAG system provides citation-grounded answers, role-based access control, and evaluation tracking. Across both systems, vibe coding accelerated scaffolding and integration. However, the generated code often under-specified isolation rules and infrastructure constraints when these were not explicitly defined. Consequently, aspects such as multi-tenancy, access control, memory policies, and asynchronous processing required deliberate architectural design and verification. We observe a shift in engineering effort from boilerplate implementation toward constraint specification and enforcement auditing. We also identify recurring architectural "non-delegation zones" where conversational code generation remains insufficient for production reliability.

CLFeb 2
Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study

Md. Toufique Hasan, Ayman Asad Khan, Mika Saari et al.

Large language models show promise for knowledge-intensive domains, yet their use in agriculture is constrained by weak grounding, English-centric training data, and limited real-world evaluation. These issues are amplified for low-resource languages, where high-quality domain documentation exists but remains difficult to access through general-purpose models. This paper presents AgriHubi, a domain-adapted retrieval-augmented generation (RAG) system for Finnish-language agricultural decision support. AgriHubi integrates Finnish agricultural documents with open PORO family models and combines explicit source grounding with user feedback to support iterative refinement. Developed over eight iterations and evaluated through two user studies, the system shows clear gains in answer completeness, linguistic accuracy, and perceived reliability. The results also reveal practical trade-offs between response quality and latency when deploying larger models. This study provides empirical guidance for designing and evaluating domain-specific RAG systems in low-resource language settings.

SEFeb 25
LLM-Based Multi-Agent Systems for Code Generation: A Multi-Vocal Literature Review

Zeeshan Rasheeda, Muhammad Waseema, Kai-Kristian Kemella et al.

Large Language Models (LLMs) have enabled multi-agent systems to perform autonomous code generation for complex tasks. Despite the recent growth in research and industrial applications in this area, there is little work on synthesizing evidence from both academic and industrial sources to capture the current state of research on LLM-based multi-agent systems for code generation. To this end, we conducted a Multi-Vocal Literature Review (MLR), combining insights from both academia and industry, including peer-reviewed studies and grey literature. The aim of this study is to systematically synthesize and analyze existing knowledge on LLM-based multi-agent systems for code generation. Specifically, the review examines the motivations for their use, employed benchmarks and models, key challenges, proposed solutions, and potential directions for future research. We selected and reviewed 114 studies, and the key findings are: 1) the identified reasons for adopting multi-agent systems for code generation were classified into nine categories; 2) the models and evaluation benchmarks utilized across the studies were systematically analyzed to provide a structured overview of commonly adopted LLM configurations and assessment practices; 3) the reported challenges and corresponding solutions were synthesized into six main categories and 26 subcategories; and 4) future research directions were identified and organized into six main categories and 18 subcategories. The results of this MLR will assist researchers and practitioners in pursuing further studies and supporting the real-world adoption of multi-agent systems in industrial settings.

CYAug 16, 2018Code
Do software firms collaborate or compete? A model of coopetition in community-initiated OSS projects

Anh Nguyen-Duc, Daniela S. Cruzes, Snarby Terje et al.

[Background] An increasing number of commercial firms are participating in Open Source Software (OSS) projects to reduce their development cost and increase technical innovativeness. When collaborating with other firms whose sought values are conflicts of interests, firms may behave uncooperatively leading to harmful impacts on the common goal. [Aim] This study explores how software firms both collaborate and compete in OSS projects. [Method] We adopted a mixed research method on three OSS projects. [Result] We found that commercial firms participating in community-initiated OSS projects collaborate in various ways across the organizational boundaries. While most of firms contribute little, a small number of firms that are very active and account for large proportions of contributions. We proposed a conceptual model to explain for coopetition among software firms in OSS projects. The model shows two aspects of coopetition can be managed at the same time based on firm gatekeepers. [Conclusion] Firms need to operationalize their coopetition strategies to maximize value gained from participating in OSS projects.

41.3SEApr 29
TDD Governance for Multi-Agent Code Generation via Prompt Engineering

Tarlan Hasanli, Shahbaz Siddeeq, Bishwash Khanal et al.

Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven development (TDD) provides a structured Red-Green-Refactor process, existing LLM-based approaches typically use tests as auxiliary inputs rather than enforceable process constraints. We present an AI-native TDD framework that operationalizes classical TDD principles as structured prompt-level and workflow-level governance mechanisms. Extracted principles are formalized in a machine-readable manifesto and distributed across planning, generation, repair, and validation stages within a layered architecture that separates model proposal from deterministic engine authority. The system enforces phase ordering, bounded repair loops, validation gates, and atomic mutation control to improve stability and reproducibility. We describe architecture and discuss encoding software engineering discipline directly into prompt orchestration, which we think offers a promising direction for reliable LLM-assisted development.

CYOct 25, 2024
Can We Trust AI Agents? A Case Study of an LLM-Based Multi-Agent System for Ethical AI

José Antonio Siqueira de Cerqueira, Mamia Agbese, Rebekah Rousi et al.

AI-based systems, including Large Language Models (LLM), impact millions by supporting diverse tasks but face issues like misinformation, bias, and misuse. AI ethics is crucial as new technologies and concerns emerge, but objective, practical guidance remains debated. This study examines the use of LLMs for AI ethics in practice, assessing how LLM trustworthiness-enhancing techniques affect software development in this context. Using the Design Science Research (DSR) method, we identify techniques for LLM trustworthiness: multi-agents, distinct roles, structured communication, and multiple rounds of debate. We design a multi-agent prototype LLM-MAS, where agents engage in structured discussions on real-world AI ethics issues from the AI Incident Database. We evaluate the prototype across three case scenarios using thematic analysis, hierarchical clustering, comparative (baseline) studies, and running source code. The system generates approximately 2,000 lines of code per case, compared to only 80 lines in baseline trials. Discussions reveal terms like bias detection, transparency, accountability, user consent, GDPR compliance, fairness evaluation, and EU AI Act compliance, showing this prototype ability to generate extensive source code and documentation addressing often overlooked AI ethics issues. However, practical challenges in source code integration and dependency management may limit its use by practitioners.

CLFeb 27, 2025
Mapping Trustworthiness in Large Language Models: A Bibliometric Analysis Bridging Theory to Practice

José Siqueira de Cerqueira, Kai-Kristian Kemell, Rebekah Rousi et al.

The rapid proliferation of Large Language Models (LLMs) has raised significant trustworthiness and ethical concerns. Despite the widespread adoption of LLMs across domains, there is still no clear consensus on how to define and operationalise trustworthiness. This study aims to bridge the gap between theoretical discussion and practical implementation by analysing research trends, definitions of trustworthiness, and practical techniques. We conducted a bibliometric mapping analysis of 2,006 publications from Web of Science (2019-2025) using the Bibliometrix, and manually reviewed 68 papers. We found a shift from traditional AI ethics discussion to LLM trustworthiness frameworks. We identified 18 different definitions of trust/trustworthiness, with transparency, explainability and reliability emerging as the most common dimensions. We identified 20 strategies to enhance LLM trustworthiness, with fine-tuning and retrieval-augmented generation (RAG) being the most prominent. Most of the strategies are developer-driven and applied during the post-training phase. Several authors propose fragmented terminologies rather than unified frameworks, leading to the risks of "ethics washing," where ethical discourse is adopted without a genuine regulatory commitment. Our findings highlight: persistent gaps between theoretical taxonomies and practical implementation, the crucial role of the developer in operationalising trust, and call for standardised frameworks and stronger regulatory measures to enable trustworthy and ethical deployment of LLMs.

SEJun 25, 2025
Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

Md Toufique Hasan, Muhammad Waseem, Kai-Kristian Kemell et al.

Retrieval-Augmented Generation (RAG) systems are emerging as a key approach for grounding Large Language Models (LLMs) in external knowledge, addressing limitations in factual accuracy and contextual relevance. However, there is a lack of empirical studies that report on the development of RAG-based implementations grounded in real-world use cases, evaluated through general user involvement, and accompanied by systematic documentation of lessons learned. This paper presents five domain-specific RAG applications developed for real-world scenarios across governance, cybersecurity, agriculture, industrial research, and medical diagnostics. Each system incorporates multilingual OCR, semantic retrieval via vector embeddings, and domain-adapted LLMs, deployed through local servers or cloud APIs to meet distinct user needs. A web-based evaluation involving a total of 100 participants assessed the systems across six dimensions: (i) Ease of Use, (ii) Relevance, (iii) Transparency, (iv) Responsiveness, (v) Accuracy, and (vi) Likelihood of Recommendation. Based on user feedback and our development experience, we documented twelve key lessons learned, highlighting technical, operational, and ethical challenges affecting the reliability and usability of RAG systems in practice.

SEJun 25, 2025
AI and Agile Software Development: From Frustration to Success -- XP2025 Workshop Summary

Tomas Herda, Victoria Pichler, Zheying Zhang et al.

The full-day workshop on AI and Agile at XP 2025 convened a diverse group of researchers and industry practitioners to address the practical challenges and opportunities of integrating Artificial Intelligence into Agile software development. Through interactive sessions, participants identified shared frustrations related to integrating AI into Agile Software Development practices, including challenges with tooling, governance, data quality, and critical skill gaps. These challenges were systematically prioritized and analyzed to uncover root causes. The workshop culminated in the collaborative development of a research roadmap that pinpoints actionable directions for future work, including both immediate solutions and ambitious long-term goals. The key outcome is a structured agenda designed to foster joint industry-academic efforts to move from identified frustrations to successful implementation.

SEDec 11, 2025
Vibe Coding in Practice: Flow, Technical Debt, and Guidelines for Sustainable Use

Muhammad Waseem, Aakash Ahmad, Kai-Kristian Kemell et al.

Vibe Coding (VC) is a form of software development assisted by generative AI, in which developers describe the intended functionality or logic via natural language prompts, and the AI system generates the corresponding source code. VC can be leveraged for rapid prototyping or developing the Minimum Viable Products (MVPs); however, it may introduce several risks throughout the software development life cycle. Based on our experience from several internally developed MVPs and a review of recent industry reports, this article analyzes the flow-debt tradeoffs associated with VC. The flow-debt trade-off arises when the seamless code generation occurs, leading to the accumulation of technical debt through architectural inconsistencies, security vulnerabilities, and increased maintenance overhead. These issues originate from process-level weaknesses, biases in model training data, a lack of explicit design rationale, and a tendency to prioritize quick code generation over human-driven iterative development. Based on our experiences, we identify and explain how current model, platform, and hardware limitations contribute to these issues, and propose countermeasures to address them, informing research and practice towards more sustainable VC approaches.

SEAug 28, 2025
AI and Agile Software Development: A Research Roadmap from the XP2025 Workshop

Zheying Zhang, Tomas Herda, Victoria Pichler et al.

This paper synthesizes the key findings from a full-day XP2025 workshop on "AI and Agile: From Frustration to Success", held in Brugg-Windisch, Switzerland. The workshop brought together over 30 interdisciplinary academic researchers and industry practitioners to tackle the concrete challenges and emerging opportunities at the intersection of Generative Artificial Intelligence (GenAI) and agile software development. Through structured, interactive breakout sessions, participants identified shared pain points like tool fragmentation, governance, data quality, and critical skills gaps in AI literacy and prompt engineering. These issues were further analyzed, revealing underlying causes and cross-cutting concerns. The workshop concluded by collaboratively co-creating a multi-thematic research roadmap, articulating both short-term, implementable actions and visionary, long-term research directions. This cohesive agenda aims to guide future investigation and drive the responsible, human-centered integration of GenAI into agile practices.

LGFeb 18, 2025
Anomaly Detection in Smart Power Grids with Graph-Regularized MS-SVDD: a Multimodal Subspace Learning Approach

Thomas Debelle, Fahad Sohrab, Pekka Abrahamsson et al.

In this paper, we address an anomaly detection problem in smart power grids using Multimodal Subspace Support Vector Data Description (MS-SVDD). This approach aims to leverage better feature relations by considering the data as coming from different modalities. These data are projected into a shared lower-dimensionality subspace which aims to preserve their inner characteristics. To supplement the previous work on this subject, we introduce novel multimodal graph-embedded regularizers that leverage graph information for every modality to enhance the training process, and we consider an improved training equation that allows us to maximize or minimize each modality according to the specified criteria. We apply this regularized graph-embedded model on a 3-modalities dataset after having generalized MS-SVDD algorithms to any number of modalities. To set up our application, we propose a whole preprocessing procedure to extract One-Class Classification training instances from time-bounded event time series that are used to evaluate both the reliability and earliness of our model for Event Detection.

SEMar 14, 2024
LLM-based agents for automating the enhancement of user story quality: An early report

Zheying Zhang, Maruf Rayhan, Tomas Herda et al.

In agile software development, maintaining high-quality user stories is crucial, but also challenging. This study explores the use of large language models to automatically improve the user story quality in Austrian Post Group IT agile teams. We developed a reference model for an Autonomous LLM-based Agent System and implemented it at the company. The quality of user stories in the study and the effectiveness of these agents for user story quality improvement was assessed by 11 participants across six agile teams. Our findings demonstrate the potential of LLMs in improving user story quality, contributing to the research on AI role in agile development, and providing a practical example of the transformative impact of AI in an industry setting.

CYJan 12, 2024
Business and ethical concerns in domestic Conversational Generative AI-empowered multi-robot systems

Rebekah Rousi, Hooman Samani, Niko Mäkitalo et al.

Business and technology are intricately connected through logic and design. They are equally sensitive to societal changes and may be devastated by scandal. Cooperative multi-robot systems (MRSs) are on the rise, allowing robots of different types and brands to work together in diverse contexts. Generative artificial intelligence has been a dominant topic in recent artificial intelligence (AI) discussions due to its capacity to mimic humans through the use of natural language and the production of media, including deep fakes. In this article, we focus specifically on the conversational aspects of generative AI, and hence use the term Conversational Generative artificial intelligence (CGI). Like MRSs, CGIs have enormous potential for revolutionizing processes across sectors and transforming the way humans conduct business. From a business perspective, cooperative MRSs alone, with potential conflicts of interest, privacy practices, and safety concerns, require ethical examination. MRSs empowered by CGIs demand multi-dimensional and sophisticated methods to uncover imminent ethical pitfalls. This study focuses on ethics in CGI-empowered MRSs while reporting the stages of developing the MORUL model.

SEFeb 11, 2022
Software Architecture for Quantum Computing Systems -- A Systematic Review

Arif Ali Khan, Aakash Ahmad, Muhammad Waseem et al.

Quantum computing systems rely on the principles of quantum mechanics to perform a multitude of computationally challenging tasks more efficiently than their classical counterparts. The architecture of software-intensive systems can empower architects who can leverage architecture-centric processes, practices, description languages, etc., to model, develop, and evolve quantum computing software (quantum software for short) at higher abstraction levels. We conducted a systematic literature review (SLR) to investigate (i) architectural process, (ii) modeling notations, (iii) architecture design patterns, (iv) tool support, and (iv) challenging factors for quantum software architecture. Results of the SLR indicate that quantum software represents a new genre of software-intensive systems; however, existing processes and notations can be tailored to derive the architecting activities and develop modeling languages for quantum software. Quantum bits (Qubits) mapped to Quantum gates (Qugates) can be represented as architectural components and connectors that implement quantum software. Tool-chains can incorporate reusable knowledge and human roles (e.g., quantum domain engineers, quantum code developers) to automate and customize the architectural process. Results of this SLR can facilitate researchers and practitioners to develop new hypotheses to be tested, derive reference architectures, and leverage architecture-centric principles and practices to engineer emerging and next generations of quantum software.

SEFeb 10, 2022
Work-from-home and its implication for project management, resilience and innovation -- a global survey on software companies

Anh Nguyen-Duc, Dron Khanna, Des Greer et al.

[Context] The COVID-19 pandemic has had a disruptive impact on how people work and collaborate across all global economic sectors, including the software business. While remote working is not new for software engineers, forced Work-from-home situations to come with both constraints, limitations, and opportunities for individuals, software teams and software companies. As the "new normal" for working might be based on the current state of Work From Home (WFH), it is useful to understand what has happened and learn from that. [Objective] The goal of this study is to gain insights on how their WFH environment impacts software projects and software companies. We are also interested in understanding if the impact differs between software startups and established companies. [Method] We conducted a global-scale, cross-sectional survey during spring and summer 2021. Our results are based on quantitative and qualitative analysis of 297 valid responses. [Results] We observed a mixed perception of the impact of WFH on software project management, resilience, and innovation. Certain patterns on WFH, control and coordination mechanisms and collaborative tools are observed globally. We find that team, agility and leadership are the three most important factors for achieving resilience during the pandemic. Although startups do not perceive the impact of WFH differently, there is a difference between engineers who work in a small team context and those who work in a large team context. [Conclusion] The result suggests a contingency approach in studying and improving WFH practices and environment in the future software industry.

LGDec 17, 2021
Quality of Data in Machine Learning

Antti Kariluoto, Arto Pärnänen, Joni Kultanen et al.

A common assumption exists according to which machine learning models improve their performance when they have more data to learn from. In this study, the authors wished to clarify the dilemma by performing an empirical experiment utilizing novel vocational student data. The experiment compared different machine learning algorithms while varying the number of data and feature combinations available for training and testing the models. The experiment revealed that the increase of data records or their sample frequency does not immediately lead to significant increases in the model accuracies or performance, however the variance of accuracies does diminish in the case of ensemble models. Similar phenomenon was witnessed while increasing the number of input features for the models. The study refutes the starting assumption and continues to state that in this case the significance in data lies in the quality of the data instead of the quantity of the data.

SDAug 12, 2021
Deep Neural Network Voice Activity Detector for Downsampled Audio Data: An Experiment Report

Mikael Ovaska, Joni Kultanen, Teemu Autto et al.

Sociometric badges are an emerging technology for study how teams interact in physical places. Audio data recorded by sociometric badges is often downsampled to not record discussions of the sociometric badges holders. To gain more information about interactions inside teams with sociometric badges a Voice Activity Detector (VAD) is deployed to measure verbal activity of the interaction. Detecting voice activity from downsampled audio data is challenging because down-sampling destroys information from the data. We developed a VAD using deep learning techniques that achieves only moderate accuracy in a low noise meeting setting and in across variable noise levels despite excellent validation performance. Experiences and lessons learned while developing the VAD are discussed.

SEMar 14, 2021
The entrepreneurial logic of startup software development: A study of 40 software startups

Anh Nguyen-Duc, Kai-Kristian Kemell, Pekka Abrahamsson

Context: Software startups are an essential source of innovation and software-intensive products. The need to understand product development in startups and to provide relevant support are highlighted in software research. While state-of-the-art literature reveals how startups develop their software, the reasons why they adopt these activities are underexplored. Objective: This study investigates the tactics behind software engineering (SE) activities by analyzing key engineering events during startup journeys. We explore how entrepreneurial mindsets may be associated with SE knowledge areas and with each startup case. Method: Our theoretical foundation is based on causation and effectuation models. We conducted semi-structured interviews with 40 software startups. We used two-round open coding and thematic analysis to describe and identify entrepreneurial software development patterns. Additionally, we calculated an effectuation index for each startup case. Results: We identified 621 events merged into 32 codes of entrepreneurial logic in SE from the sample. We found a systemic occurrence of the logic in all areas of SE activities. Minimum Viable Product (MVP), Technical Debt (TD), and Customer Involvement (CI) tend to be associated with effectual logic, while testing activities at different levels are associated with causal logic. The effectuation index revealed that startups are either effectuation-driven or mixed-logics-driven. Conclusions: Software startups fall into two types that differentiate between how traditional SE approaches may apply to them. Effectuation seems the most relevant and essential model for explaining and developing suitable SE practices for software startups.

SEFeb 11, 2021
Business Model Canvas Should Pay More Attention to the Software Startup Team

Kai-Kristian Kemell, Atte Elonen, Mari Suoranta et al.

Business Model Canvas (BMC) is a tool widely used to describe startup business models. Despite the various business aspects described, BMC pays a little emphasis on team-related factors. The importance of team-related factors in software development has been acknowledged widely in literature. While not as extensively studied, the importance of teams in software startups is also known in both literature and among practitioners. In this paper, we propose potential changes to BMC to have the tool better reflect the importance of the team, especially in a software startup environment. Based on a literature review, we identify various components related to the team, which we then further support with empirical data. We do so by means of a qualitative case study of five startups.

SEFeb 11, 2021
Software Startup Practices -- Software Development in Startups through the Lens of the Essence Theory of Software Engineering

Kai-Kristian Kemell, Ville Ravaska, Anh Nguyen-Duc et al.

Software startups continue to be important drivers of economy globally. As the initial investment required to found a new software company becomes smaller and smaller resulting from technological advances such as cloud technology, increasing numbers of new software startups are born. Typically, the main argument for studying software startups is that they differ from mature software organizations in various ways, thus making the findings of many existing studies not directly applicable to them. How, exactly, software startups really differ from other types of software organizations as an on-going debate. In this paper, we seek to better understand how software startups differ from mature software organizations in terms of development practices. Past studies have primarily studied method use, and in comparison, we take on a more atomic approach by focusing on practices. Utilizing the Essence Theory of Software Engineering as a framework, we split these practices into categories for analysis while simultaneously evaluating the suitability of the theory for the context of software startups. Based on the results, we propose changes to the Essence Theory of Software Engineering for it to better fit the startup context.

SENov 20, 2019
Product Innovation through Internal Startup in Large Software Companies: a Case Study

Henry Edison, Xiaofeng Wang, Pekka Abrahamsson

Product innovation is a risky activity, but when successful, it enables large software companies accrue high profits and leapfrog the competition. Internal startups have been promoted as one way to foster product innovation in large companies, which allows them to innovate as startups do. However, internal startups in large companies are challenging endeavours despite of the promised benefits. How large software companies can leverage internal startups in software product innovation is not fully understood due to the scarcity of the relevant studies. Based on a conceptual framework that combines the elements from the Lean startup approach and an internal corporate venturing model, we conducted a case study of a large software company to examine how a new product was developed through the internal startup effort and struggled to achieve the desired outcomes set by the management. As a result, the conceptual framework was further developed into a Lean startup-enabled new product development model for large software companies.

SEMar 26, 2019
Agile Software Development Method, A Comparative Review1

Pekka Abrahamsson, Nilay Oza, Mikko T. Siponen

Although agile software development methods have caught the attention of software engineers and researchers worldwide, scientific research still remains quite scarce. The aim of this study is to order and make sense of the different agile approaches that have been proposed. This comparative review is performed from the standpoint of using the following features as the analytical perspectives: project management support, life-cycle coverage, type of practical guidance, adaptability in actual use, type of research objectives and existence of empirical evidence. The results show that agile software development methods cover, without offering any rationale, different phases of the software development life-cycle and that most of these methods fail to provide adequate project management support. Moreover, quite a few methods continue to offer little concrete guidance on how to use their solutions or how to adapt them in different development situations. Empirical evidence after ten years of application remains quite limited. Based on the results, new directions on agile methods are outlined.

SEMar 26, 2019
The Personal Software Process, Experiences from Denmark

Pekka Abrahamsson, Karlheinz Kautz

Software process improvement (SPI) research and practice is transforming from the traditional large-scale assessment based improvement initiatives into smaller sized, tailored initiatives where the emphasis is set on the development personnel and their personal abilities. The personal software process (PSPSM) is a method for improving the personal capabilities of a single software engineer. This paper contributes to the body of knowledge within this area by reporting experiences from Denmark. The results indicate an improvement in the effort estimation skills and a significant increase in the resulting product quality in terms of reduced total defect density. The data shows that with relatively small effort (i.e., 10%) used in defect prevention activities (i.e., design and code reviews) almost one third of all defects were removed and consequently the time required for the testing was cut by 50%. Based on this data the use of the PSP method in the software industry is discussed.

SEMar 26, 2019
Commitment to Software Process improvement Development of Diagnostic Tool to Facilitate Improvement1

Pekka Abrahamsson

This paper suggests that by operationalizing the concept of commitment in the shape of a model, a new insight is provided in improving software processes - a more human centered approach as opposed to various technical approaches available. In doing so the SPI managers/change agents are able to plan better the software process improvement initiative and benchmark successful projects (as well as failed ones). Results from five interviews with SPI professionals on the proposed Behavior-based Commitment Model are reported, together with early results from the empirical test in 14 software process improvement projects. Early results suggest that the behaviors introduced in the model are relevant in SPI initiatives, the use of model raises the awareness about the people issues in improving processes, and the model could be used aside with CMM, SPICE or other process improvement models. Keywords: software process improvement, commitment, diagnostic tool, self-perception theory.

SEMar 22, 2019
Commitment Nets in Software Process Improvement

Pekka Abrahamsson

Several studies have revealed the fact that nearly two-thirds of all software process improvement (SPI) efforts have failed or have at least fallen short of expectations. Literature and practice have shown that commitment to SPI at all organizational levels is essential for the success of any SPI endeavor. A research model for studying the existence, development and interplay of SPI-related commitment is introduced in this paper. This study suggests that software organizations operate through strategic, operational and personal commitment nets. These nets consist of actors, drivers, concerns, actions, commitment, and outcomes. The commitment nets model is applied in a study of four industrial SPI initiatives. The results from two of these cases are reported here. The results show that SPI is driven through the formation and reformation of commitment nets. The contents of strategic, operational and personal commitment nets are laid out and implications are discussed.

SESep 24, 2018
The Essence Theory of Software Engineering - Large-Scale Classroom Experiences from 450+ Software Engineering BSc Students

Kai-Kristian Kemell, Anh Nguyen-Duc, Xiaofeng Wang et al.

Software Engineering as an industry is highly diverse in terms of development methods and practices. Practitioners employ a myriad of methods and tend to further tailor them by e.g. omitting some practices or rules. This diversity in development methods poses a challenge for software engineering education, creating a gap between education and industry. General theories such as the Essence Theory of Software Engineering can help bridge this gap by presenting software engineering students with higher-level frameworks upon which to build an understanding of software engineering methods and practical project work. In this paper, we study Essence in an educational setting to evaluate its usefulness for software engineering students while also investigating barriers to its adoption in this context. To this end, we observe 102 student teams utilize Essence in practical software engineering projects during a semester long, project-based course.

SESep 23, 2018
Gamifying the Escape from the Engineering Method Prison - An Innovative Board Game to Teach the Essence Theory to Future Project Managers and Software Engineers

Kai-Kristian Kemell, Juhani Risku, Arthur Evensen et al.

Software Engineering is an engineering discipline but lacks a solid theoretical foundation. One effort in remedying this situation has been the SEMAT Essence specification. Essence consists of a language for modeling Software Engineering (SE) practices and methods and a kernel containing what its authors describe as being elements that are present in every software development project. In practice, it is a method agnostic project management tool for SE Projects. Using the language of the specification, Essence can be used to model any software development method or practice. Thus, the specification can potentially be applied to any software development context, making it a powerful tool. However, due to the manual work and the learning process involved in modeling practices with Essence, its initial adoption can be tasking for development teams. Due to the importance of project management in SE projects, new project management tools such as Essence are valuable, and facilitating their adoption is consequently important. To tackle this issue in the case of Essence, we present a game-based approach to teaching the use Essence. In this paper, we gamify the learning process by means of an innovative board game. The game is empirically validated in a study involving students from the IT faculty of University of Jyväskylä (n=61). Based on the results, we report the effectiveness of the game-based approach to teaching both Essence and SE project work.

AISep 19, 2018
The Key Concepts of Ethics of Artificial Intelligence - A Keyword based Systematic Mapping Study

Ville Vakkuri, Pekka Abrahamsson

The growing influence and decision-making capacities of Autonomous systems and Artificial Intelligence in our lives force us to consider the values embedded in these systems. But how ethics should be implemented into these systems? In this study, the solution is seen on philosophical conceptualization as a framework to form practical implementation model for ethics of AI. To take the first steps on conceptualization main concepts used on the field needs to be identified. A keyword based Systematic Mapping Study (SMS) on the keywords used in AI and ethics was conducted to help in identifying, defying and comparing main concepts used in current AI ethics discourse. Out of 1062 papers retrieved SMS discovered 37 re-occurring keywords in 83 academic papers. We suggest that the focus on finding keywords is the first step in guiding and providing direction for future research in the AI ethics field.

SEAug 16, 2018
A preliminary study of agility in business and production - Cases of early-stage hardware startups

Anh Nguyen Duc, Xiaofang Weng, Pekka Abrahamsson

[Context]Advancement in technologies, popularity of small-batch manufacturing and the recent trend of investing in hardware startups are among the factors leading to the rise of hardware startups nowadays. It is essential for hardware startups to be not only agile to develop their business but also efficient to develop the right products. [Objective] We investigate how hardware startups achieve agility when developing their products in early stages. [Methods] A qualitative research is conducted with data from 20 hardware startups. [Result] Preliminary results show that agile development is known to hardware entrepreneurs, however it is adopted limitedly. We also found tactics in four domains (1) strategy, (2) personnel, (3) artifact and (4) resource that enable hardware startups agile in their early stage business and product development. [Conclusions] Agile methodologies should be adopted with the consideration of specific features of hardware development, such as up-front design and vendor dependencies.

SEAug 8, 2018
Essencery - A Tool for Essentializing Software Engineering Practices

Arthur Evensen, Kai-Kristian Kemell, Xiaofeng Wang et al.

Software Engineering practitioners work using highly diverse methods and practices, and general theories in software engineering are lacking. One attempt at creating a common ground in the area of software engineering methodologies has been the Essence Theory of Software Engineering, which can be considered a method-agnostic project management tool for software engineering. Essence supports the use of any development practices and provides a framework for building a suitable method for any software engineering context. However, Essence presently suffers from low practitioner adoption that is partially considered to be caused by a lack of proper tooling. In this paper, we present Essencery, a tool for essentializing software engineering methods and practices using the Essence graphical syntax. Essencery aims to facilitate adoption of Essence among potential future users. We present an empirical evaluation of the tool by means of a qualitative, quasi-formal experiment and, based on the experiment, confirm that the tool is easy to use and useful for its intended purpose.

CYFeb 23, 2018
Lean Internal Startups for Software Product Innovation in Large Companies: Enablers and Inhibitors

Henry Edison, Nina M. Smørsgård, Xiaofeng Wang et al.

To compete in this age of disruption, large companies cannot rely on cost efficiency, lead time reduction and quality improvement. They are now looking for ways to innovate like startups. Meanwhile, the awareness and use of the Lean startup approach have grown rapidly amongst the software startup community in recent years. This study investigates how Lean internal startup facilitates software product innovation in large companies and identifies its enablers and inhibitors. A multiple case study approach is followed in the investigation. Two software product innovation projects from two large companies are examined, using a conceptual framework that is based on the method-in-action framework and extended with the previously developed Lean-Internal Corporate Venture model. Seven face-to-face in-depth interviews of the employees with different roles are conducted. Within-case analysis and cross-case comparison are applied to draw the findings from the cases. A generic process flow summarises the common key processes of Lean internal startups. The findings suggest that an internal startup that is initiated management or employees faces different challenges. A list of enablers of applying Lean startup in large companies are identified, including top management support and cross-functional team. Both cases face different inhibitors due to the different process of inception, objective of the team and type of the product. Our contributions are threefold. First, this study is one of the first attempt to investigate the use of Lean startup approach in large companies empirically. Second, the study shows the potential of the method-in-action framework to investigate the Lean startup approach in non-startup context. The third is a general process of Lean internal startup and the evidence of the enablers and inhibitors of implementing it, which are both theory-informed and empirically grounded.

SEFeb 16, 2018
Innovation Initiatives in Large Software Companies: A Systematic Mapping Study

Henry Edison, Xiaofeng Wang, Ronald Jabangwe et al.

To keep the competitive advantage and adapt to changes in the market and technology, companies need to innovate in an organised, purposeful and systematic manner. However, due to their size and complexity, large companies tend to focus on maintaining their business, which can potentially lower their agility to innovate. This study aims to provide an overview of the current research on innovation initiatives and to identify the challenges of implementing the initiatives in the context of large software companies. The investigation was performed using a systematic mapping approach of published literature on corporate innovation and entrepreneurship. Then it was complemented with interviews with four experts with rich industry experience. Our study results suggest that, there is a lack of high quality empirical studies on innovation initiative in the context of large software companies. A total of 7 studies are conducted in such context, which reported 5 types of initiatives: intrapreneurship, bootlegging, internal venture, spin-off and crowdsourcing. Our study offers three contributions. First, this paper represents the map of existing literature on innovation initiatives inside large companies. The second contribution is to provide an innovation initiative tree. The third contribution is to identify key challenges faced by each initiative in large software companies. At the strategic and tactical levels, there is no difference between large software companies and other companies. At the operational level, large software companies are highly influenced by the advancement of Internet technology. Large software companies use open innovation paradigm as part of their innovation initiatives. We envision a future work is to further empirically evaluate the innovation initiative tree in large software companies, which involves more practitioners from different companies.

SEDec 2, 2017
What influences the speed of prototyping? An empirical investigation of twenty software startups

Anh Nguyen Duc, Xiaofeng Wang, Pekka Abrahamsson

It is essential for startups to quickly experiment business ideas by building tangible prototypes and collecting user feedback on them. As prototyping is an inevitable part of learning for early stage software startups, how fast startups can learn depends on how fast they can prototype. Despite of the importance, there is a lack of research about prototyping in software startups. In this study, we aimed at understanding what are factors influencing different types of prototyping activities. We conducted a multiple case study on twenty European software startups. The results are two folds, firstly we propose a prototype-centric learning model in early stage software startups. Secondly, we identify factors occur as barriers but also facilitators for prototyping in early stage software startups. The factors are grouped into (1) artifacts, (2) team competence, (3) collaboration, (4) customer and (5) process dimensions. To speed up a startups progress at the early stage, it is important to incorporate the learning objective into a well-defined collaborative approach of prototyping

CYDec 2, 2017
Exploring the outsourcing relationship in software startups: A multiple case study

Anh Nguyen Duc, Pekka Abrahamsson

Software startups are becoming increasingly popular in software industry as well as other sectors of economy. Startups that lack necessary competences often seek for external resources from outsourcing partners. Little is known how this outsourcing relationship works and whether it makes sense to outsource the technical competence to an external party. This is among the first investigations on the outsourcing relationships in software startups. By conducting exploratory case studies at six startups, we found a mixed experience with outsourcing. The experimental nature of an early product development makes outsourcing a feasible option, although startups often suffer from its uncertainty and managing commitments from partners. Results further propose that early contract-based activities could be transformed into a long-term partnership by adopting a startup boundary spanner s role, establishing an inter-personal relationship and maintaining a mutual commitment.

SEDec 2, 2017
A survey study on major technical barriers affecting the decision to adopt cloud services

Nattakarn Phaphoom, Xiaofeng Wang, Sarah Samuel et al.

In the context of cloud computing, risks associated with underlying technologies, risks involving service models and outsourcing, and enterprise readiness have been recognized as potential barriers for the adoption. To accelerate cloud adoption, the concrete barriers negatively influencing the adoption decision need to be identified. Our study aims at understanding the impact of technical and security-related barriers on the organizational decision to adopt the cloud. We analyzed data collected through a web survey of 352 individuals working for enterprises consisting of decision makers as well as employees from other levels within an organization. The comparison of adopter and non-adopter sample reveals three potential adoption inhibitor, security, data privacy, and portability. The result from our logistic regression analysis confirms the criticality of the security concern, which results in an up to 26-fold increase in the non-adoption likelihood. Our study underlines the importance of the technical and security perspectives for research investigating the adoption of technology.

SENov 23, 2017
Software Development Under Stringent Hardware Constraints: Do Agile Methods Have a Chance?

Jussi Ronkainen, Pekka Abrahamsson

Agile software development methods have been suggested as useful in many situations and contexts. However, only few (if any) experiences are available regarding the use of agile methods in embedded domain where the hardware sets tight requirements for the software. This development domain is arguably far away from the agile home ground. This paper explores the possibility of using agile development techniques in this environment and defines the requirements for new agile methods targeted to facilitate the development of embedded software. The findings are based on an empirical study over a period 12 months in the development of low-level telecommunications software. We maintain that by addressing the requirements we discovered, agile methods can be successful also in the embedded software domain.

SENov 14, 2017
A Comparative Case Study on the Impact of Test-Driven Development on Program Design and Test Coverage

Maria Siniaalto, Pekka Abrahamsson

Test-driven development (TDD) is a programming technique in which the tests are written prior to the source code. It is proposed that TDD is one of the most fundamental practices enabling the development of software in an agile and iterative manner. Both the literature and practice suggest that TDD practice yields several benefits. Essentially, it is claimed that TDD leads to an improved software design, which has a dramatic impact on the maintainability and further development of the system. The impact of TDD on program design has seldom come under the researchers' focus. This paper reports the results from a comparative case study of three software development projects where the effect of TDD on program design was measured using object-oriented metrics. The results show that the effect of TDD on program design was not as evident as expected, but the test coverage was significantly superior to iterative test-last development.

CYOct 29, 2017
How Do Software Startups Pivot? Empirical Results from a Multiple Case Study

Sohaib Shahid Bajwa, Xiaofeng Wang, Anh Nguven Duc et al.

In order to handle intense time pressure and survive in dynamic market, software startups have to make crucial decisions constantly on whether to change directions or stay on chosen courses, or in the terms of Lean Startup, to pivot or to persevere. The existing research and knowledge on software startup pivots are very limited. In this study, we focused on understanding the pivoting processes of software startups, and identified the triggering factors and pivot types. To achieve this, we employed a multiple case study approach, and analyzed the data obtained from four software startups. The initial findings show that different software startups make different types of pivots related to business and technology during their product development life cycle. The pivots are triggered by various factors including negative customer feedback.

SEOct 11, 2017
Failures to be celebrated: an analysis of major pivots of software startups

Sohaib Shahid Bajwa, Xiaofeng Wang, Anh Nguyen Duc et al.

In the context of software startups, project failure is embraced actively and considered crucial to obtain validated learning that can lead to pivots. A pivot is the strategic change of a business concept, product or the different elements of a business model. A better understanding is needed on different types of pivots and different factors that lead to failures and trigger pivots, for software entrepreneurial teams to make better decisions under chaotic and unpredictable environment. Due to the nascent nature of the topic, the existing research and knowledge on the pivots of software startups are very limited. In this study, we aimed at identifying the major types of pivots that software startups make during their startup processes, and highlighting the factors that fail software projects and trigger pivots. To achieve this, we conducted a case survey study based on the secondary data of the major pivots happened in 49 software startups. 10 pivot types and 14 triggering factors were identified. The findings show that customer need pivot is the most common among all pivot types. Together with customer segment pivot, they are common market related pivots. The major product related pivots are zoom-in and technology pivots. Several new pivot types were identified, including market zoom-in, complete and side project pivots. Our study also demonstrates that negative customer reaction and flawed business model are the most common factors that trigger pivots in software startups. Our study extends the research knowledge on software startup pivot types and pivot triggering factors. Meanwhile it provides practical knowledge to software startups, which they can utilize to guide their effective decisions on pivoting

SESep 25, 2017
Agile Software Development Methods: Review and Analysis

Pekka Abrahamsson, Outi Salo, Jussi Ronkainen et al.

Agile - denoting "the quality of being agile, readiness for motion, nimbleness, activity, dexterity in motion" - software development methods are attempting to offer an answer to the eager business community asking for lighter weight along with faster and nimbler software development processes. This is especially the case with the rapidly growing and volatile Internet software industry as well as for the emerging mobile application environment. The new agile methods have evoked substantial amount of literature and debates. However, academic research on the subject is still scarce, as most of existing publications are written by practitioners or consultants. The aim of this publication is to begin filling this gap by systematically reviewing the existing literature on agile software development methodologies. This publication has three purposes. First, it proposes a definition and a classification of agile software development approaches. Second, it analyses ten software development methods that can be characterized as being "agile" against the defined criterion. Third, it compares these methods and highlights their similarities and differences. Based on this analysis, future research needs are identified and discussed.

SESep 22, 2017
Female Leadership in Software Projects: A Preliminary Result on Leadership Style and Project Context Factors

Anh Nguyen-Duc, Soudabeh Khodambashi, Jon Atle Gulla et al.

Women have been shown to be effective leaders in many team-based situations. However, it is also well-recognized that women are underrepresented in engineering and technology areas, which leads to wasted efforts and a lack of diversity in professional organizations. Although studies about gender and leadership are rich, research focusing on engineering-specific activities, are scarce. To react on this gap, we explored the experience of female leaders of software development projects and possible context factors that influence leadership effectiveness. The study was conducted as a longitudinal multiple case study. Data was collected from survey, interviews, observation and project reports. In this work, we reported some preliminary findings related to leadership style, team perception on leadership and team-task context factors. We found a strong correlation between perceived team leadership and task management. We also observed a potential association between human-oriented leading approach in low customer involvement scenarios and task-oriented leading approach in high customer involvement situations.

SESep 22, 2017
Making the leap to a software platform strategy: Issues and challenges

Yaser Ghanam, Frank Maurer, Pekka Abrahamsson

Context: While there are many success stories of achieving high reuse and improved quality using software platforms, there is a need to investigate the issues and challenges organizations face when transitioning to a software platform strategy. Objective: This case study provides a comprehensive taxonomy of the challenges faced when a medium-scale organization decided to adopt software platforms. The study also reveals how new trends in software engineering (i.e. agile methods, distributed development and flat management structures) interplayed with the chosen platform strategy. Method: We used an ethnographic approach to collect data by spending time at a medium-scale company in Scandinavia. We conducted 16 in-depth interviews with representatives of eight different teams, three of which were working on three separate platforms. The collected data was analyzed using Grounded Theory. Results: The findings identify four classes of challenges, namely: business challenges, organizational challenges, technical challenges, and people challenges. The article explains how these findings can be used to help researchers and practitioners identify practical solutions and required tool support. Conclusion: The organization's decision to adopt a software platform strategy introduced a number of challenges. These challenges need to be understood and addressed in order to reap the benefits of reuse. Researchers need to further investigate issues such as supportive organizational structures for platform development, the role of agile methods in software platforms, tool support for testing and continuous integration in the platform context, and reuse recommendation systems.

SESep 20, 2017
Achieving CMMI Level 2 with Enhanced Extreme Programming Approach

Tuomo Kähkönen, Pekka Abrahamsson

The relationship between agile methods and Software Engineering Institute's CMM approach is often debated. Some authors argue that the approaches are compatible, while others have criticized the application of agile methods from the CMM perspective. Only few CMM based assessments have been performed on projects using agile approaches. This paper explores an empirical case where a project using Extreme Programming (XP) based approach was assessed using the CMMI framework. The results provide empirical evidence pointing out that it is possible to achieve maturity level 2 with approach based on XP. Yet, the results confirm that XP, as it is defined, is not sufficient. This study demonstrates that it is possible to use the CMMI for assessing and improving agile processes. However, the analysis reveals that assessing an agile organization requires more interpretations than normally would be the case. It is further concluded that the CMMI model does not always support interpretations in an agile context.

SESep 20, 2017
Mobile-D: An Agile Approach for Mobile Application Development

Pekka Abrahamsson, Antti Hanhineva, Hanna Hulkko et al.

Mobile phones have been closed environments until recent years. The change brought by open platform technologies such as the Symbian operating system and Java technologies has opened up a significant business opportunity for anyone to develop application software such as games for mobile terminals. However, developing mobile applications is currently a challenging task due to the specific demands and technical constraints of mobile development. Furthermore, at the moment very little is known about the suitability of the different development processes for mobile application development. Due to these issues, we have developed an agile development approach called Mobile-D. The Mobile-D approach is briefly outlined here and the experiences gained from four case studies are discussed.

SESep 14, 2017
Why Early-Stage Software Startups Fail: A Behavioral Framework

Carmine Giardino, Xiaofeng Wang, Pekka Abrahamsson

Software startups are newly created companies with little operating history and oriented towards producing cutting-edge products. As their time and resources are extremely scarce, and one failed project can put them out of business, startups need effective practices to face with those unique challenges. However, only few scientific studies attempt to address characteristics of failure, especially during the early- stage. With this study we aim to raise our understanding of the failure of early-stage software startup companies. This state-of-practice investigation was performed using a literature review followed by a multiple-case study approach. The results present how inconsistency between managerial strategies and execution can lead to failure by means of a behavioral framework. Despite strategies reveal the first need to understand the problem/solution fit, actual executions prioritize the development of the product to launch on the market as quickly as possible to verify product/market fit, neglecting the necessary learning process.