Wes Masri

SE
7papers
28citations
Novelty48%
AI Score24

7 Papers

CVJul 24, 2022
Inter-model Interpretability: Self-supervised Models as a Case Study

Ahmad Mustapha, Wael Khreich, Wes Masri

Since early machine learning models, metrics such as accuracy and precision have been the de facto way to evaluate and compare trained models. However, a single metric number doesn't fully capture the similarities and differences between models, especially in the computer vision domain. A model with high accuracy on a certain dataset might provide a lower accuracy on another dataset, without any further insights. To address this problem we build on a recent interpretability technique called Dissect to introduce \textit{inter-model interpretability}, which determines how models relate or complement each other based on the visual concepts they have learned (such as objects and materials). Towards this goal, we project 13 top-performing self-supervised models into a Learned Concepts Embedding (LCE) space that reveals proximities among models from the perspective of learned concepts. We further crossed this information with the performance of these models on four computer vision tasks and 15 datasets. The experiment allowed us to categorize the models into three categories and revealed for the first time the type of visual concepts different tasks requires. This is a step forward for designing cross-task learning algorithms.

SEJul 6, 2016Code
GUICop: Approach and Toolset for Specification-based GUI Testing

Dalal Hammoud, Fadi A. Zaraket, Wes Masri

Oracles used for testing graphical user interface (GUI) programs are required to take into consideration complicating factors such as variations in screen resolution or color scheme when comparing observed GUI elements to expected GUI elements. Researchers proposed fuzzy comparison rules and computationally expensive image processing techniques to tame the comparison process since otherwise the naive matching comparison would be too constraining and consequently impractical. Alternatively, this paper proposes GUICop, a novel approach with a supporting toolset that takes (1) a GUI program and (2) user-defined GUI specifications characterizing the rendering behavior of the GUI elements, and checks whether the execution traces of the program satisfy the specifications. GUICop comprises the following: 1) a GUI Specification Language; 2) a Driver; 3) Instrumented GUI Libraries; 4) a Solver; and 5) a Code Weaver. The user defines the specifications of the subject GUI program using the GUI Specification Language. The Driver traverses the GUI structure of the program and generates events that drive its execution. The Instrumented GUI Libraries capture the GUI execution trace, i.e., information about the positions and visibility of the GUI elements. And the Solver, enabled by code injected by the Code Weaver, checks whether the traces satisfy the specifications. GUICop was successfully evaluated using four open source GUI applications that included eight defects, namely, Jajuk, Gason, JEdit, and TerpPaint.

SEAug 28, 2018
Coincidental Correctness in the Defects4J Benchmark

Rawad Abou Assi, Chadi Trad, Marwan Maalouf et al.

Coincidental correctness (CC) arises when a defective program produces the correct output despite the fact that the defect within was exercised. Researchers have recognized the negative impact of coincidental correctness, and the authors have previously conducted a study demonstrating its prevalence in test suites. However, that study was limited to system tests and small subjects seeded with artificial defects. In this paper, we conduct a wider scope study of CC that addresses the following research questions in the context of the Defects4J benchmark: RQ1: Is CC prevalent in Defects4J? RQ2: Is CC affected by the testing levels in Defects4J? RQ3: Do CC tests induce peculiar infection paths in Defects4J? RQ4: Are the infections likely to be nullified within or outside the buggy method? ....

SEAug 28, 2018
CFAAR: Control Flow Alteration to Assist Repair

Chadi Trad, Rawad Abou Assi, Wes Masri et al.

We present CFAAR, a program repair assistance technique that operates by selectively altering the outcome of suspicious predicates in order to yield expected behavior. CFAAR is applicable to defects that are repairable by negating predicates under specific conditions. CFAAR proceeds as follows: 1) it identifies predicates such that negating them at given instances would make the failing tests exhibit correct behavior; 2) for each candidate predicate, it uses the program state information to build a classifier that dictates when the predicate should be negated; 3) for each classifier, it leverages a Decision Tree to synthesize a patch to be presented to the developer. We evaluated our toolset using 149 defects from the IntroClass and Siemens benchmarks. CFAAR identified 91 potential candidate defects and generated plausible patches for 41 of them. Twelve of the patches are believed to be correct, whereas the rest provide repair assistance to the developer.

SEAug 24, 2018
Substate Profiling for Effective Test Suite Reduction

Chadi Trad, Rawad Abou Assi, Wes Masri

Test suite reduction (TSR) aims at removing redundant test cases from regression test suites. A typical TSR approach ensures that structural profile elements covered by the original test suite are also covered by the reduced test suite. It is plausible that structural profiles might be unable to segregate failing runs from passing runs, which diminishes the effectiveness of TSR in regard to defect detection. This motivated us to explore state profiles, which are based on the collective values of program variables. This paper presents Substate Profiling, a new form of state profiling that enhances existing profile-based analysis techniques such as TSR and coverage-based fault localization. Compared to current approaches for capturing program states, Substate Profiling is more practical and finer grained. We evaluated our approach using thirteen multi-fault subject programs comprising 53 defects. Our study involved greedy TSR using Substate profiles and four structural profiles, namely, basic-block, branch, def-use pair, and the combination of the three. For the majority of the subjects, Substate Profiling detected considerably more defects with a comparable level of reduction. Also, Substate profiles were found to be complementary to structural profiles in many cases, thus, combining both types is beneficial.

SEMay 2, 2017
ACDC: Altering Control Dependence Chains for Automated Patch Generation

Rawad Abou Assi, Chadi Trad, Wes Masri

Once a failure is observed, the primary concern of the developer is to identify what caused it in order to repair the code that induced the incorrect behavior. Until a permanent repair is afforded, code repair patches are invaluable. The aim of this work is to devise an automated patch generation technique that proceeds as follows: Step1) It identifies a set of failure-causing control dependence chains that are minimal in terms of number and length. Step2) It identifies a set of predicates within the chains along with associated execution instances, such that negating the predicates at the given instances would exhibit correct behavior. Step3) For each candidate predicate, it creates a classifier that dictates when the predicate should be negated to yield correct program behavior. Step4) Prior to each candidate predicate, the faulty program is injected with a call to its corresponding classifier passing it the program state and getting a return value predictively indicating whether to negate the predicate or not. The role of the classifiers is to ensure that: 1) the predicates are not negated during passing runs; and 2) the predicates are negated at the appropriate instances within failing runs. We implemented our patch generation approach for the Java platform and evaluated our toolset using 148 defects from the Introclass and Siemens benchmarks. The toolset identified 56 full patches and another 46 partial patches, and the classification accuracy averaged 84%.

SEJul 11, 2014
UCov: a User-Defined Coverage Criterion for Test Case Intent Verification

Rawad Abou Assi, Fadi A. Zaraket, Wes Masri

The goal of regression testing is to ensure that the behavior of existing code is not altered by new program changes. The primary focus of regression testing should be on code associated with: a) earlier bug fixes; and b) particular application scenarios considered to be important by the tester. Existing coverage criteria do not enable such focus, e.g., 100% branch coverage does not guarantee that a given bug fix is exercised or a given application scenario is tested. Therefore, there is a need for a complementary coverage criterion in which the user can define a test requirement characterizing a given behavior to be covered as opposed to choosing from a pool of pre-defined and generic program elements. We propose UCov, a user-defined coverage criterion wherein a test requirement is an execution pattern of program elements and predicates. Our proposed criterion is not meant to replace existing criteria, but to complement them as it focuses the testing on important code patterns that could go untested otherwise. UCov supports test case intent verification. For example, following a bug fix, the testing team may augment the regression suite with the test case that revealed the bug. However, this test case might become obsolete due to code modifications not related to the bug. But if an execution pattern characterizing the bug was defined by the user, UCov would determine that test case intent verification failed. We implemented our methodology for the Java platform and applied it onto two real life case studies. Our implementation comprises the following: 1) an Eclipse plugin allowing the user to easily specify non-trivial test requirements; 2) the ability of cross referencing test requirements across subsequent versions of a given program; and 3) the ability of checking whether user-defined test requirements were satisfied, i.e., test case intent verification.