SESep 20, 2021Code
Metamorphic Relation Prioritization for Effective Regression TestingMadhusudan Srinivasan, Upulee Kanewala
Metamorphic testing (MT) is widely used for testing programs that face the oracle problem. It uses a set of metamorphic relations (MRs), which are relations among multiple inputs and their corresponding outputs to determine whether the program under test is faulty. Typically, MRs vary in their ability to detect faults in the program under test, and some MRs tend to detect the same set of faults. In this paper, we propose approaches to prioritize MRs to improve the efficiency and effectiveness of MT for regression testing. We present two MR prioritization approaches: (1) fault-based and (2) coverage-based. To evaluate these MR prioritization approaches, we conduct experiments on three complex open-source software systems. Our results show that the MR prioritization approaches developed by us significantly outperform the current practice of executing the source and follow-up test cases of the MRs in an ad-hoc manner in terms of fault detection effectiveness. Further, fault-based MR prioritization leads to reducing the number of source and follow-up test cases that needs to be executed as well as reducing the average time taken to detect a fault, which would result in saving time and cost during the testing process.
CLMay 9, 2025
Efficient Fairness Testing in Large Language Models: Prioritizing Metamorphic Relations for Bias DetectionSuavis Giramata, Madhusudan Srinivasan, Venkat Naidu Gudivada et al.
Large Language Models (LLMs) are increasingly deployed in various applications, raising critical concerns about fairness and potential biases in their outputs. This paper explores the prioritization of metamorphic relations (MRs) in metamorphic testing as a strategy to efficiently detect fairness issues within LLMs. Given the exponential growth of possible test cases, exhaustive testing is impractical; therefore, prioritizing MRs based on their effectiveness in detecting fairness violations is crucial. We apply a sentence diversity-based approach to compute and rank MRs to optimize fault detection. Experimental results demonstrate that our proposed prioritization approach improves fault detection rates by 22% compared to random prioritization and 12% compared to distance-based prioritization, while reducing the time to the first failure by 15% and 8%, respectively. Furthermore, our approach performs within 5% of fault-based prioritization in effectiveness, while significantly reducing the computational cost associated with fault labeling. These results validate the effectiveness of diversity-based MR prioritization in enhancing fairness testing for LLMs.
CLApr 4, 2025
Metamorphic Testing for Fairness Evaluation in Large Language Models: Identifying Intersectional Bias in LLaMA and GPTHarishwar Reddy, Madhusudan Srinivasan, Upulee Kanewala
Large Language Models (LLMs) have made significant strides in Natural Language Processing but remain vulnerable to fairness-related issues, often reflecting biases inherent in their training data. These biases pose risks, particularly when LLMs are deployed in sensitive areas such as healthcare, finance, and law. This paper introduces a metamorphic testing approach to systematically identify fairness bugs in LLMs. We define and apply a set of fairness-oriented metamorphic relations (MRs) to assess the LLaMA and GPT model, a state-of-the-art LLM, across diverse demographic inputs. Our methodology includes generating source and follow-up test cases for each MR and analyzing model responses for fairness violations. The results demonstrate the effectiveness of MT in exposing bias patterns, especially in relation to tone and sentiment, and highlight specific intersections of sensitive attributes that frequently reveal fairness faults. This research improves fairness testing in LLMs, providing a structured approach to detect and mitigate biases and improve model robustness in fairness-sensitive applications.
SEApr 16, 2019
Metamorphic Testing for Quality Assurance of Protein Function Prediction ToolsMorteza Pourreza Shahri, Madhusudan Srinivasan, Gillian Reynolds et al.
Proteins are the workhorses of life and gaining insight on their functions is of paramount importance for applications such as drug design. However, the experimental validation of functions of proteins is highly-resource consuming. Therefore, recently, automated protein function prediction (AFP) using machine learning has gained significant interest. Many of these AFP tools are based on supervised learning models trained using existing gold-standard functional annotations, which are known to be incomplete. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, due to the incompleteness of gold-standard data, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the \emph{oracle problem}. In this work, we use metamorphic testing (MT) to test nine state-of-the-art AFP tools by defining a set of metamorphic relations (MRs) that apply input transformations to protein sequences. According to our results, we observe that several AFP tools fail all the test cases causing concerns over the quality of their predictions.
SEFeb 20, 2018
Quality Assurance of Bioinformatics Software: A Case Study of Testing a Biomedical Text Processing Tool Using Metamorphic TestingMadhusudan Srinivasan, Morteza Pourreza Shahri, Upulee Kanewala et al.
Bioinformatics software plays a very important role in making critical decisions within many areas including medicine and health care. However, most of the research is directed towards developing tools, and little time and effort is spent on testing the software to assure its quality. In testing, a test oracle is used to determine whether a test is passed or failed during testing, and unfortunately, for much of bioinformatics software, the exact expected outcomes are not well defined. Thus, the main challenge associated with conducting systematic testing on bioinformatics software is the oracle problem. Metamorphic testing (MT) is a technique used to test programs that face the oracle problem. MT uses metamorphic relations (MRs) to determine whether a test has passed or failed and specifies how the output should change according to a specific change made to the input. In this work, we use MT to test LingPipe, a tool for processing text using computational linguistics, often used in bioinformatics for bio-entity recognition from biomedical literature. First, we identify a set of MRs for testing any bio-entity recognition program. Then we develop a set of test cases that can be used to test LingPipe's bio-entity recognition functionality using these MRs. To evaluate the effectiveness of this testing process, we automatically generate a set of faulty versions of LingPipe. According to our analysis of the experimental results, we observe that our MRs can detect the majority of these faulty versions, which shows the utility of this testing technique for quality assurance of bioinformatics software.