SEOct 30, 2023
Can ChatGPT advance software testing intelligence? An experience report on metamorphic testingQuang-Hung Luu, Huai Liu, Tsong Yueh Chen
While ChatGPT is a well-known artificial intelligence chatbot being used to answer human's questions, one may want to discover its potential in advancing software testing. We examine the capability of ChatGPT in advancing the intelligence of software testing through a case study on metamorphic testing (MT), a state-of-the-art software testing technique. We ask ChatGPT to generate candidates of metamorphic relations (MRs), which are basically necessary properties of the object program and which traditionally require human intelligence to identify. These MR candidates are then evaluated in terms of correctness by domain experts. We show that ChatGPT can be used to generate new correct MRs to test several software systems. Having said that, the majority of MR candidates are either defined vaguely or incorrect, especially for systems that have never been tested with MT. ChatGPT can be used to advance software testing intelligence by proposing MR candidates that can be later adopted for implementing tests; but human intelligence should still inevitably be involved to justify and rectify their correctness.
SEMar 28, 2025
Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT's Capabilities in Generating Metamorphic RelationsYifan Zhang, Dave Towey, Matthew Pike et al.
Context: This paper provides an in-depth examination of the generation and evaluation of Metamorphic Relations (MRs) using GPT models developed by OpenAI, with a particular focus on the capabilities of GPT-4 in software testing environments. Objective: The aim is to examine the quality of MRs produced by GPT-3.5 and GPT-4 for a specific System Under Test (SUT) adopted from an earlier study, and to introduce and apply an improved set of evaluation criteria for a diverse range of SUTs. Method: The initial phase evaluates MRs generated by GPT-3.5 and GPT-4 using criteria from a prior study, followed by an application of an enhanced evaluation framework on MRs created by GPT-4 for a diverse range of nine SUTs, varying from simple programs to complex systems incorporating AI/ML components. A custom-built GPT evaluator, alongside human evaluators, assessed the MRs, enabling a direct comparison between automated and human evaluation methods. Results: The study finds that GPT-4 outperforms GPT-3.5 in generating accurate and useful MRs. With the advanced evaluation criteria, GPT-4 demonstrates a significant ability to produce high-quality MRs across a wide range of SUTs, including complex systems incorporating AI/ML components. Conclusions: GPT-4 exhibits advanced capabilities in generating MRs suitable for various applications. The research underscores the growing potential of AI in software testing, particularly in the generation and evaluation of MRs, and points towards the complementarity of human and AI skills in this domain.
SENov 20, 2021
Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention NetworksZhehao Zhao, Bo Yang, Ge Li et al.
Deep learning is being used extensively in a variety of software engineering tasks, e.g., program classification and defect prediction. Although the technique eliminates the required process of feature engineering, the construction of source code model significantly affects the performance on those tasks. Most recent works was mainly focused on complementing AST-based source code models by introducing contextual dependencies extracted from CFG. However, all of them pay little attention to the representation of basic blocks, which are the basis of contextual dependencies. In this paper, we integrated AST and CFG and proposed a novel source code model embedded with hierarchical dependencies. Based on that, we also designed a neural network that depends on the graph attention mechanism.Specifically, we introduced the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information and fill the gap. We have evaluated this model on three practical software engineering tasks and compared it with other state-of-the-art methods. The results show that our model can significantly improve the performance. For example, compared to the best performing baseline, our model reduces the scale of parameters by 50\% and achieves 4\% improvement on accuracy on program classification task.
SEDec 19, 2020
A Declarative Metamorphic Testing Framework for Autonomous DrivingYao Deng, Xi Zheng, Tianyi Zhang et al.
Autonomous driving has gained much attention from both industry and academia. Currently, Deep Neural Networks (DNNs) are widely used for perception and control in autonomous driving. However, several fatal accidents caused by autonomous vehicles have raised serious safety concerns about autonomous driving models. Some recent studies have successfully used the metamorphic testing technique to detect thousands of potential issues in some popularly used autonomous driving models. However, prior study is limited to a small set of metamorphic relations, which do not reflect rich, real-world traffic scenarios and are also not customizable. This paper presents a novel declarative rule-based metamorphic testing framework called RMT. RMT provides a rule template with natural language syntax, allowing users to flexibly specify an enriched set of testing scenarios based on real-world traffic rules and domain knowledge. RMT automatically parses human-written rules to metamorphic relations using an NLP-based rule parser referring to an ontology list and generates test cases with a variety of image transformation engines. We evaluated RMT on three autonomous driving models. With an enriched set of metamorphic relations, RMT detected a significant number of abnormal model predictions that were not detected by prior work. Through a large-scale human study on Amazon Mechanical Turk, we further confirmed the authenticity of test cases generated by RMT and the validity of detected abnormal model predictions.
SEDec 14, 2018
Monitoring Informed Testing for IoTAhmed Abdullah, Heinz W. Schmidt, Maria Spichkova et al.
Internet of Things (IoT) systems continuously collect a large amount of data from heterogeneous "smart objects" through standardised service interfaces. A key challenge is how to use these data and relevant event logs to construct continuously adapted usage profiles and apply them to enhance testing methods, i.e., prioritization of tests for the testing of continuous integration of an IoT system. In addition, these usage profiles provide relevance weightings to analyse architecture and behaviour of the system. Based on the analysis, testing methods can predict specific system locations that are susceptible to error, and therefore suggest where expanded runtime monitoring is necessary. Furthermore, IoT aims to connect billions of "smart devices" over the network. Testing even a small IoT system connecting a few dozens of smart devices would require a network of test Virtual Machines (VMs) possibly spreading across the Fog and the Cloud. In this paper we propose a framework for testing of each IoT layer in a separate VM environment, and discuss potential difficulties with optimal VM allocation.
SEMar 12, 2015
Human Factors in Software Reliability EngineeringMaria Spichkova, Huai Liu, Mohsen Laali et al.
In this paper, we present our vision of the integration of human factors engineering into the software development process. The aim of this approach is to improve the quality of software and to deal with human errors in a systematic way.