16.3SEMay 27
Multi-Agent LLM-based Metamorphic Testing for REST APIsShehroz Khan, Abdullah Mughees, Gaadha Sudheerbabu et al.
As REST APIs become an increasingly significant part of software systems, their validation is becoming more critical. Hence, testing and uncovering underlying issues are of utmost importance for improving software quality. However, testing REST APIs is challenging mainly due to the difficulty of assessing whether the output of an API call is correct, i.e., the test oracle problem. Metamorphic testing is a specification-based testing approach for situations where correct outputs are unknown or not specified explicitly. To check the correctness of a system, relations between the different outputs are specified. We present ARMeta, a tool-supported approach that uses an LLM-based multi-agent workflow to support metamorphic testing of REST APIs documented with OpenAPI. The agentic workflow is used to identify metamorphic test scenarios and specify them in the Given-When-Then format. These scenarios are automatically implemented as executable tests and executed against the system under test. We evaluate ARMeta on two publicly available web applications that expose REST interfaces and compare its performance with a scenario-based testing baseline. The results show that ARMeta explores behaviors that serve as a complement to existing scenario-based testing approaches.
CVJun 21, 2021
Affect-driven Ordinal Engagement Measurement from VideoAli Abedi, Shehroz Khan
In education and intervention programs, user engagement has been identified as a major factor in successful program completion. Automatic measurement of user engagement provides helpful information for instructors to meet program objectives and individualize program delivery. In this paper, we present a novel approach for video-based engagement measurement in virtual learning programs. We propose to use affect states, continuous values of valence and arousal extracted from consecutive video frames, along with a new latent affective feature vector and behavioral features for engagement measurement. Deep-learning sequential models are trained and validated on the extracted frame-level features. In addition, due to the fact that engagement is an ordinal variable, we develop the ordinal versions of the above models in order to address the problem of engagement measurement as an ordinal classification problem. We evaluated the performance of the proposed method on the only two publicly available video engagement measurement datasets, DAiSEE and EmotiW-EW, containing videos of students in online learning programs. Our experiments show a state-of-the-art engagement level classification accuracy of 67.4% on the DAiSEE dataset, and a regression mean squared error of 0.0508 on the EmotiW-EW dataset. Our ablation study shows the effectiveness of incorporating affect states and ordinality of engagement in engagement measurement.
CVDec 3, 2018
Learning to Unlearn: Building Immunity to Dataset Bias in Medical Imaging StudiesAhmed Ashraf, Shehroz Khan, Nikhil Bhagwat et al.
Medical imaging machine learning algorithms are usually evaluated on a single dataset. Although training and testing are performed on different subsets of the dataset, models built on one study show limited capability to generalize to other studies. While database bias has been recognized as a serious problem in the computer vision community, it has remained largely unnoticed in medical imaging research. Transfer learning thus remains confined to the re-use of feature representations requiring re-training on the new dataset. As a result, machine learning models do not generalize even when trained on imaging datasets that were captured to study the same variable of interest. The ability to transfer knowledge gleaned from one study to another, without the need for re-training, if possible, would provide reassurance that the models are learning knowledge fundamental to the problem under study instead of latching onto the idiosyncracies of a dataset. In this paper, we situate the problem of dataset bias in the context of medical imaging studies. We show empirical evidence that such a problem exists in medical datasets. We then present a framework to unlearn study membership as a means to handle the problem of database bias. Our main idea is to take the data from the original feature space to an intermediate space where the data points are indistinguishable in terms of which study they come from, while maintaining the recognition capability with respect to the variable of interest. This will promote models which learn the more general properties of the etiology under study instead of aligning to dataset-specific peculiarities. Essentially, our proposed model learns to unlearn the dataset bias.
AIJun 25, 2012
Relational Approach to Knowledge Engineering for POMDP-based Assistance Systems as a Translation of a Psychological ModelMarek Grzes, Jesse Hoey, Shehroz Khan et al.
Assistive systems for persons with cognitive disabilities (e.g. dementia) are difficult to build due to the wide range of different approaches people can take to accomplishing the same task, and the significant uncertainties that arise from both the unpredictability of client's behaviours and from noise in sensor readings. Partially observable Markov decision process (POMDP) models have been used successfully as the reasoning engine behind such assistive systems for small multi-step tasks such as hand washing. POMDP models are a powerful, yet flexible framework for modelling assistance that can deal with uncertainty and utility. Unfortunately, POMDPs usually require a very labour intensive, manual procedure for their definition and construction. Our previous work has described a knowledge driven method for automatically generating POMDP activity recognition and context sensitive prompting systems for complex tasks. We call the resulting POMDP a SNAP (SyNdetic Assistance Process). The spreadsheet-like result of the analysis does not correspond to the POMDP model directly and the translation to a formal POMDP representation is required. To date, this translation had to be performed manually by a trained POMDP expert. In this paper, we formalise and automate this translation process using a probabilistic relational model (PRM) encoded in a relational database. We demonstrate the method by eliciting three assistance tasks from non-experts. We validate the resulting POMDP models using case-based simulations to show that they are reasonable for the domains. We also show a complete case study of a designer specifying one database, including an evaluation in a real-life experiment with a human actor.