46.1SEApr 16
The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SERobert Feldt, Per Lenberg, Julian Frattini et al.
AI-based systems, currently driven largely by LLMs and tool-using agentic harnesses, are increasingly discussed as a possible threat to software engineering. Foundation models get stronger, agents can plan and act across multiple steps, and tasks such as scaffolding, routine test generation, straightforward bug fixing, and small integration work look more exposed than they did only a few years ago. The result is visible unease not only among students and junior developers, but also among experienced practitioners who worry that hard-won expertise may lose value. This paper argues for a different reading. The important shift is not that software engineering loses relevance. It is that the thing being engineered expands beyond executable code to semi-executable artifacts; combinations of natural language, tools, workflows, control mechanisms, and organizational routines whose enactment depends on human or probabilistic interpretation rather than deterministic execution. The Semi-Executable Stack is introduced as a six-ring diagnostic reference model for reasoning about that expansion, spanning executable artifacts, instructional artifacts, orchestrated execution, controls, operating logic, and societal and institutional fit. The model helps locate where a contribution, bottleneck, or organizational transition primarily sits, and which adjacent rings it depends on. The paper develops the argument through three worked cases, reframes familiar objections as engineering targets rather than reasons to dismiss the transition, and closes with a preserve-versus-purify heuristic for deciding which legacy software engineering processes, controls, and coordination routines should be kept and which should be simplified or redesigned. This paper is a conceptual keynote companion: diagnostic and agenda-setting rather than empirical.
25.3SEApr 9
Towards Improving the External Validity of Software Engineering Experiments with Transportability MethodsJulian Frattini, Richard Torkar, Robert Feldt et al.
Controlled experiments are a core research method in software engineering (SE) for validating causal claims. However, recruiting a sample of participants that represents the intended target population is often difficult or expensive, which limits the external validity of experimental results. At the same time, SE researchers often have access to much larger amounts of observational than experimental data (e.g., from repositories, issue trackers, logs, surveys and industrial processes). Transportability methods combine these data from experimental and observational studies to "transport" results from the experimental sample to a broader, more representative sample of the target population. Although the ability to combine observational and experimental data in a principled way could substantially benefit empirical SE research, transportability methods have - to our knowledge - not been adopted in SE. In this vision, we aim to help make that adoption possible. To that end, we introduce transportability methods, their prerequisites, and demonstrate their potential through a simulation. We then outline several SE research scenarios in which these methods could apply, e.g., how to effectively use students as substitutes for developers. Finally, we outline a road map and practical guidelines to support SE researchers in applying them. Adopting transportability methods in SE research can strengthen the external validity of controlled experiments and help the field produce results that are both more reliable and more useful in practice.
SEFeb 2, 2022
Automatic Creation of Acceptance Tests by Extracting Conditionals from Requirements: NLP Approach and Case StudyJannik Fischbach, Julian Frattini, Andreas Vogelsang et al.
Acceptance testing is crucial to determine whether a system fulfills end-user requirements. However, the creation of acceptance tests is a laborious task entailing two major challenges: (1) practitioners need to determine the right set of test cases that fully covers a requirement, and (2) they need to create test cases manually due to insufficient tool support. Existing approaches for automatically deriving test cases require semi-formal or even formal notations of requirements, though unrestricted natural language is prevalent in practice. In this paper, we present our tool-supported approach CiRA (Conditionals in Requirements Artifacts) capable of creating the minimal set of required test cases from conditional statements in informal requirements. We demonstrate the feasibility of CiRA in a case study with three industry partners. In our study, out of 578 manually created test cases, 71.8 % can be generated automatically. Additionally, CiRA discovered 80 relevant test cases that were missed in manual test case design. CiRA is publicly available at www.cira.bth.se/demo/.
SEDec 15, 2021
Causality in Requirements Artifacts: Prevalence, Detection, and ImpactJulian Frattini, Jannik Fischbach, Daniel Mendez et al.
Background: Causal relations in natural language (NL) requirements convey strong, semantic information. Automatically extracting such causal information enables multiple use cases, such as test case generation, but it also requires to reliably detect causal relations in the first place. Currently, this is still a cumbersome task as causality in NL requirements is still barely understood and, thus, barely detectable. Objective: In our empirically informed research, we aim at better understanding the notion of causality and supporting the automatic extraction of causal relations in NL requirements. Method: In a first case study, we investigate 14.983 sentences from 53 requirements documents to understand the extent and form in which causality occurs. Second, we present and evaluate a tool-supported approach, called CiRA, for causality detection. We conclude with a second case study where we demonstrate the applicability of our tool and investigate the impact of causality on NL requirements. Results: The first case study shows that causality constitutes around 28% of all NL requirements sentences. We then demonstrate that our detection tool achieves a macro-F1 score of 82% on real-world data and that it outperforms related approaches with an average gain of 11.06% in macro-Recall and 11.43% in macro-Precision. Finally, our second case study corroborates the positive correlations of causality with features of NL requirements. Conclusion: The results strengthen our confidence in the eligibility of causal relations for downstream reuse, while our tool and publicly available data constitute a first step in the ongoing endeavors of utilizing causality in RE and beyond.
SESep 5, 2021
How Do Practitioners Interpret Conditionals in Requirements?Jannik Fischbach, Julian Frattini, Daniel Mendez et al.
Context: Conditional statements like "If A and B then C" are core elements for describing software requirements. However, there are many ways to express such conditionals in natural language and also many ways how they can be interpreted. We hypothesize that conditional statements in requirements are a source of ambiguity, potentially affecting downstream activities such as test case generation negatively. Objective: Our goal is to understand how specific conditionals are interpreted by readers who work with requirements. Method: We conduct a descriptive survey with 104 RE practitioners and ask how they interpret 12 different conditional clauses. We map their interpretations to logical formulas written in Propositional (Temporal) Logic and discuss the implications. Results: The conditionals in our tested requirements were interpreted ambiguously. We found that practitioners disagree on whether an antecedent is only sufficient or also necessary for the consequent. Interestingly, the disagreement persists even when the system behavior is known to the practitioners. We also found that certain cue phrases are associated with specific interpretations. Conclusion: Conditionals in requirements are a source of ambiguity and there is not just one way to interpret them formally. This affects any analysis that builds upon formalized requirements (e.g., inconsistency checking, test-case generation). Our results may also influence guidelines for writing requirements.
CLAug 2, 2021
Transfer Learning for Mining Feature Requests and Bug Reports from Tweets and App Store ReviewsPablo Restrepo Henao, Jannik Fischbach, Dominik Spies et al.
Identifying feature requests and bug reports in user comments holds great potential for development teams. However, automated mining of RE-related information from social media and app stores is challenging since (1) about 70% of user comments contain noisy, irrelevant information, (2) the amount of user comments grows daily making manual analysis unfeasible, and (3) user comments are written in different languages. Existing approaches build on traditional machine learning (ML) and deep learning (DL), but fail to detect feature requests and bug reports with high Recall and acceptable Precision which is necessary for this task. In this paper, we investigate the potential of transfer learning (TL) for the classification of user comments. Specifically, we train both monolingual and multilingual BERT models and compare the performance with state-of-the-art methods. We found that monolingual BERT models outperform existing baseline methods in the classification of English App Reviews as well as English and Italian Tweets. However, we also observed that the application of heavyweight TL models does not necessarily lead to better performance. In fact, our multilingual BERT models perform worse than traditional ML methods.
CLJul 21, 2021
CATE: CAusality Tree Extractor from Natural Language RequirementsNoah Jadallah, Jannik Fischbach, Julian Frattini et al.
Causal relations (If A, then B) are prevalent in requirements artifacts. Automatically extracting causal relations from requirements holds great potential for various RE activities (e.g., automatic derivation of suitable test cases). However, we lack an approach capable of extracting causal relations from natural language with reasonable performance. In this paper, we present our tool CATE (CAusality Tree Extractor), which is able to parse the composition of a causal relation as a tree structure. CATE does not only provide an overview of causes and effects in a sentence, but also reveals their semantic coherence by translating the causal relation into a binary tree. We encourage fellow researchers and practitioners to use CATE at https://causalitytreeextractor.com/
CLJul 21, 2021
Fine-Grained Causality Extraction From Natural Language Requirements Using Recursive Neural Tensor NetworksJannik Fischbach, Tobias Springer, Julian Frattini et al.
[Context:] Causal relations (e.g., If A, then B) are prevalent in functional requirements. For various applications of AI4RE, e.g., the automatic derivation of suitable test cases from requirements, automatically extracting such causal statements are a basic necessity. [Problem:] We lack an approach that is able to extract causal relations from natural language requirements in fine-grained form. Specifically, existing approaches do not consider the combinatorics between causes and effects. They also do not allow to split causes and effects into more granular text fragments (e.g., variable and condition), making the extracted relations unsuitable for automatic test case derivation. [Objective & Contributions:] We address this research gap and make the following contributions: First, we present the Causality Treebank, which is the first corpus of fully labeled binary parse trees representing the composition of 1,571 causal requirements. Second, we propose a fine-grained causality extractor based on Recursive Neural Tensor Networks. Our approach is capable of recovering the composition of causal statements written in natural language and achieves a F1 score of 74 % in the evaluation on the Causality Treebank. Third, we disclose our open data sets as well as our code to foster the discourse on the automatic extraction of causality in the RE community.
SEMar 11, 2021
CiRA: A Tool for the Automatic Detection of Causal Relationships in Requirements ArtifactsJannik Fischbach, Julian Frattini, Andreas Vogelsang
Requirements often specify the expected system behavior by using causal relations (e.g., If A, then B). Automatically extracting these relations supports, among others, two prominent RE use cases: automatic test case derivation and dependency detection between requirements. However, existing tools fail to extract causality from natural language with reasonable performance. In this paper, we present our tool CiRA (Causality detection in Requirements Artifacts), which represents a first step towards automatic causality extraction from requirements. We evaluate CiRA on a publicly available data set of 61 acceptance criteria (causal: 32; non-causal: 29) describing the functionality of the German Corona-Warn-App. We achieve a macro F_1 score of 83%, which corroborates the feasibility of our approach.
SEJan 26, 2021
Automatic Detection of Causality in Requirement Artifacts: the CiRA ApproachJannik Fischbach, Julian Frattini, Arjen Spaans et al.
System behavior is often expressed by causal relations in requirements (e.g., If event 1, then event 2). Automatically extracting this embedded causal knowledge supports not only reasoning about requirements dependencies, but also various automated engineering tasks such as seamless derivation of test cases. However, causality extraction from natural language is still an open research challenge as existing approaches fail to extract causality with reasonable performance. We understand causality extraction from requirements as a two-step problem: First, we need to detect if requirements have causal properties or not. Second, we need to understand and extract their causal relations. At present, though, we lack knowledge about the form and complexity of causality in requirements, which is necessary to develop a suitable approach addressing these two problems. We conduct an exploratory case study with 14,983 sentences from 53 requirements documents originating from 18 different domains and shed light on the form and complexity of causality in requirements. Based on our findings, we develop a tool-supported approach for causality detection (CiRA). This constitutes a first step towards causality extraction from NL requirements. We report on a case study and the resulting tool-supported approach for causality detection in requirements. Our case study corroborates, among other things, that causality is, in fact, a widely used linguistic pattern to describe system behavior, as about a third of the analyzed sentences are causal. We further demonstrate that our tool CiRA achieves a macro-F1 score of 82 % on real word data and that it outperforms related approaches with an average gain of 11.06 % in macro-Recall and 11.43 % in macro-Precision. Finally, we disclose our open data sets as well as our tool to foster the discourse on the automatic detection of causality in the RE community.
SEJan 19, 2021
Assets in Software Engineering: What are they after all?Ehsan Zabardast, Julian Frattini, Javier Gonzalez-Huerta et al.
During the development and maintenance of software-intensive products or services, we depend on various artefacts. Some of those artefacts, we deem central to the feasibility of a project and the product's final quality. Typically, these central artefacts are referred to as assets. However, despite their central role in the software development process, little thought is yet invested into what eventually characterises as an asset, often resulting in many terms and underlying concepts being mixed and used inconsistently. A precise terminology of assets and related concepts, such as asset degradation, are crucial for setting up a new generation of cost-effective software engineering practices. In this position paper, we critically reflect upon the notion of assets in software engineering. As a starting point, we define the terminology and concepts of assets and extend the reasoning behind them. We explore assets' characteristics and discuss what asset degradation is as well as its various types and the implications that asset degradation might bring for the planning, realisation, and evolution of software-intensive products and services over time. We aspire to contribute to a more standardised definition of assets in software engineering and foster research endeavours and their practical dissemination in a common, more unified direction.