35.2SEMar 18
The Software Engineering Simulations Lab: Agentic AI for RE Quality SimulationsHenning Femmer, Ivan Esau
Context and motivation. Requirements Engineering (RE) quality still lacks empirical evidence on how specific requirement defects affect downstream activities. Problem: However, empirical data on the detailed effects of requirements quality defects is scarce, since it is costly to obtain. Furthermore, with the advent of AI-based development, the requirements quality factors may change: Requirements are no longer only consumed by humans, but increasingly also by AI agents, which might lead to a different efficient and effective requirements style. Principal ideas: We propose to extend the RE research toolbox with Agentic AI simulations, in which software engineering (SE) processes are replicated by standardized agents in qualitative simulations. We argue that their speed and simplicity makes them a valuable addition to RE research, although limitations in replicating human behavior need to be studied and understood. Contribution: This paper contributes a first concept, a research roadmap, a prototype, and a first feasibility study for RE simulations with agentic AI. Study results indicate that even a naïve implementation leads to executable simulations, encouraging technical improvements along with broader application in RE research.
SESep 5, 2021
How Do Practitioners Interpret Conditionals in Requirements?Jannik Fischbach, Julian Frattini, Daniel Mendez et al.
Context: Conditional statements like "If A and B then C" are core elements for describing software requirements. However, there are many ways to express such conditionals in natural language and also many ways how they can be interpreted. We hypothesize that conditional statements in requirements are a source of ambiguity, potentially affecting downstream activities such as test case generation negatively. Objective: Our goal is to understand how specific conditionals are interpreted by readers who work with requirements. Method: We conduct a descriptive survey with 104 RE practitioners and ask how they interpret 12 different conditional clauses. We map their interpretations to logical formulas written in Propositional (Temporal) Logic and discuss the implications. Results: The conditionals in our tested requirements were interpreted ambiguously. We found that practitioners disagree on whether an antecedent is only sufficient or also necessary for the consequent. Interestingly, the disagreement persists even when the system behavior is known to the practitioners. We also found that certain cue phrases are associated with specific interpretations. Conclusion: Conditionals in requirements are a source of ambiguity and there is not just one way to interpret them formally. This affects any analysis that builds upon formalized requirements (e.g., inconsistency checking, test-case generation). Our results may also influence guidelines for writing requirements.
CLJul 21, 2021
Fine-Grained Causality Extraction From Natural Language Requirements Using Recursive Neural Tensor NetworksJannik Fischbach, Tobias Springer, Julian Frattini et al.
[Context:] Causal relations (e.g., If A, then B) are prevalent in functional requirements. For various applications of AI4RE, e.g., the automatic derivation of suitable test cases from requirements, automatically extracting such causal statements are a basic necessity. [Problem:] We lack an approach that is able to extract causal relations from natural language requirements in fine-grained form. Specifically, existing approaches do not consider the combinatorics between causes and effects. They also do not allow to split causes and effects into more granular text fragments (e.g., variable and condition), making the extracted relations unsuitable for automatic test case derivation. [Objective & Contributions:] We address this research gap and make the following contributions: First, we present the Causality Treebank, which is the first corpus of fully labeled binary parse trees representing the composition of 1,571 causal requirements. Second, we propose a fine-grained causality extractor based on Recursive Neural Tensor Networks. Our approach is capable of recovering the composition of causal statements written in natural language and achieves a F1 score of 74 % in the evaluation on the Causality Treebank. Third, we disclose our open data sets as well as our code to foster the discourse on the automatic extraction of causality in the RE community.
SESep 3, 2020
What Makes Agile Test Artifacts Useful? An Activity-Based Quality Model from a Practitioners' PerspectiveJannik Fischbach, Henning Femmer, Daniel Mendez et al.
Background: The artifacts used in Agile software testing and the reasons why these artifacts are used are fairly well-understood. However, empirical research on how Agile test artifacts are eventually designed in practice and which quality factors make them useful for software testing remains sparse. Aims: Our objective is two-fold. First, we identify current challenges in using test artifacts to understand why certain quality factors are considered good or bad. Second, we build an Activity-Based Artifact Quality Model that describes what Agile test artifacts should look like. Method: We conduct an industrial survey with 18 practitioners from 12 companies operating in seven different domains. Results: Our analysis reveals nine challenges and 16 factors describing the quality of six test artifacts from the perspective of Agile testers. Interestingly, we observed mostly challenges regarding language and traceability, which are well-known to occur in non-Agile projects. Conclusions: Although Agile software testing is becoming the norm, we still have little confidence about general do's and don'ts going beyond conventional wisdom. This study is the first to distill a list of quality factors deemed important to what can be considered as useful test artifacts.
SEFeb 7, 2020
How do Quantifiers Affect the Quality of Requirements?Katharina Winter, Henning Femmer, Andreas Vogelsang
Context: Requirements quality can have a substantial impact on the effectiveness and efficiency of using requirements artifacts in a development process. Quantifiers such as "at least", "all", or "exactly" are common language constructs used to express requirements. Quantifiers can be formulated by affirmative phrases ("At least") or negative phrases ("Not less than"). Problem: It is long assumed that negation in quantification negatively affects the readability of requirements, however, empirical research on these topics remains sparse. Principal Idea: In a web-based experiment with 51 participants, we compare the impact of negations and quantifiers on readability in terms of reading effort, reading error rate and perceived reading difficulty of requirements. Results: For 5 out of 9 quantifiers, our participants performed better on the affirmative phrase compared to the negative phrase. Only for one quantifier, the negative phrase was more effective. Contribution: This research focuses on creating an empirical understanding of the effect of language in Requirements Engineering. It furthermore provides concrete advice on how to phrase requirements.
CRMar 20, 2018
Identifying Relevant Information Cues for Vulnerability Assessment Using CVSSLuca Allodi, Sebastian Banescu, Henning Femmer et al.
The assessment of new vulnerabilities is an activity that accounts for information from several data sources and produces a `severity' score for the vulnerability. The Common Vulnerability Scoring System (\CVSS) is the reference standard for this assessment. Yet, no guidance currently exists on \emph{which information} aids a correct assessment and should therefore be considered. In this paper we address this problem by evaluating which information cues increase (or decrease) assessment accuracy. We devise a block design experiment with 67 software engineering students with varying vulnerability information and measure scoring accuracy under different information sets. We find that baseline vulnerability descriptions provided by standard vulnerability sources provide only part of the information needed to achieve an accurate vulnerability assessment. Further, we find that additional information on \texttt{assets}, \texttt{attacks}, and \texttt{vulnerability type} contributes in increasing the accuracy of the assessment; conversely, information on \texttt{known threats} misleads the assessor and decreases assessment accuracy and should be avoided when assessing vulnerabilities. These results go in the direction of formalizing the vulnerability communication to, for example, fully automate security assessments.
SEFeb 24, 2017
Does Quality of Requirements Specifications matter? Combined Results of Two Empirical StudiesJakob Mund, Henning Femmer, Daniel Méndez Fernández et al.
Background: Requirements Engineering is crucial for project success, and to this end, many measures for quality assurance of the software requirements specification (SRS) have been proposed. Goal: However, we still need an empirical understanding on the extent to which SRS are created and used in practice, as well as the degree to which the quality of an SRS matters to subsequent development activities. Method: We studied the relevance of SRS by relying on survey research and explored the impact of quality defects in SRS by relying on a controlled experiment. Results: Our results suggest that the relevance of SRS quality depends both on particular project characteristics and what is considered as a quality defect; for instance, the domain of safety critical systems seems to motivate for an intense usage of SRS as a means for communication whereas defects hampering the pragmatic quality do not seem to be as relevant as initially thought. Conclusion: Efficient and effective quality assurance measures must be specific for carefully characterized contexts and carefully select defect classes.