Saeed Parsa

h-index22

12papers

188citations

Novelty39%

AI Score27

Ranked #154,907 of 194,257 authors (top 80%)#1,803 in SE (top 59%)

12 Papers

14.2SEJun 28, 2023

A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

Morteza Zakeri-Nasrabadi, Saeed Parsa, Mohammad Ramezani et al.

Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.

7.5SEJun 2, 2023

A systematic literature review on the code smells datasets and validation mechanisms

Morteza Zakeri-Nasrabadi, Saeed Parsa, Ehsan Esmaili et al.

The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.

9.8SEAug 20, 2022Code

An ensemble meta-estimator to predict source code testability

Morteza Zakeri-Nasrabadi, Saeed Parsa

Unlike most other software quality attributes, testability cannot be evaluated solely based on the characteristics of the source code. The effectiveness of the test suite and the budget assigned to the test highly impact the testability of the code under test. The size of a test suite determines the test effort and cost, while the coverage measure indicates the test effectiveness. Therefore, testability can be measured based on the coverage and number of test cases provided by a test suite, considering the test budget. This paper offers a new equation to estimate testability regarding the size and coverage of a given test suite. The equation has been used to label 23,000 classes belonging to 110 Java projects with their testability measure. The labeled classes were vectorized using 262 metrics. The labeled vectors were fed into a family of supervised machine learning algorithms, regression, to predict testability in terms of the source code metrics. Regression models predicted testability with an R2 of 0.68 and a mean squared error of 0.03, suitable in practice. Fifteen software metrics highly affecting testability prediction were identified using a feature importance analysis technique on the learned model. The proposed models have improved mean absolute error by 38% due to utilizing new criteria, metrics, and data compared with the relevant study on predicting branch coverage as a test criterion. As an application of testability prediction, it is demonstrated that automated refactoring of 42 smelly Java classes targeted at improving the 15 influential software metrics could elevate their testability by an average of 86.87%.

7.4SEAug 20, 2022

Learning to predict test effectiveness

Morteza Zakeri-Nasrabadi, Saeed Parsa

The high cost of the test can be dramatically reduced, provided that the coverability as an inherent feature of the code under test is predictable. This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability. The prediction model consists of an ensemble of four regression models. The learning samples consist of feature vectors, where features are source code metrics computed for a class. The samples are labeled by the Coverageability values computed for their corresponding classes. We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class. We extend the size of the feature space by introducing a new approach to defining sub-metrics in terms of existing source code metrics. Using feature importance analysis on the learned prediction models, we sort source code metrics in the order of their impact on the test effectiveness. As a result of which, we found the class strict cyclomatic complexity as the most influential source code metric. Our experiments with the prediction models on a large corpus of Java projects containing about 23,000 classes demonstrate the Mean Absolute Error (MAE) of 0.032, Mean Squared Error (MSE) of 0.004, and an R2-score of 0.855. Compared with the state-of-the-art coverage prediction models, our models improve MAE, MSE, and an R2-score by 5.78%, 2.84%, and 20.71%, respectively.

2.0LGNov 13, 2023

Mitigating Backdoors within Deep Neural Networks in Data-limited Configuration

Soroush Hashemifar, Saeed Parsa, Morteza Zakeri-Nasrabadi

As the capacity of deep neural networks (DNNs) increases, their need for huge amounts of data significantly grows. A common practice is to outsource the training process or collect more data over the Internet, which introduces the risks of a backdoored DNN. A backdoored DNN shows normal behavior on clean data while behaving maliciously once a trigger is injected into a sample at the test time. In such cases, the defender faces multiple difficulties. First, the available clean dataset may not be sufficient for fine-tuning and recovering the backdoored DNN. Second, it is impossible to recover the trigger in many real-world applications without information about it. In this paper, we formulate some characteristics of poisoned neurons. This backdoor suspiciousness score can rank network neurons according to their activation values, weights, and their relationship with other neurons in the same layer. Our experiments indicate the proposed method decreases the chance of attacks being successful by more than 50% with a tiny clean dataset, i.e., ten clean samples for the CIFAR-10 dataset, without significantly deteriorating the model's performance. Moreover, the proposed method runs three times as fast as baselines.

2.1AIOct 29, 2023Code

Path Analysis for Effective Fault Localization in Deep Neural Networks

Soroush Hashemifar, Saeed Parsa, Akram Kalaee

Deep learning has revolutionized numerous fields, yet the reliability of Deep Neural Networks (DNNs) remains a concern due to their complexity and data dependency. Traditional software fault localization methods, such as Spectrum-based Fault Localization (SBFL), have been adapted for DNNs but often fall short in effectiveness. These methods typically overlook the propagation of faults through neural pathways, resulting in less precise fault detection. Research indicates that examining neural pathways, rather than individual neurons, is crucial because issues in one neuron can affect its entire pathway. By investigating these interconnected pathways, we can better identify and address problems arising from the collective activity of neurons. To address this limitation, we introduce the NP-SBFL method, which leverages Layer-wise Relevance Propagation (LRP) to identify essential faulty neural pathways. Our method explores multiple fault sources to accurately pinpoint faulty neurons by analyzing their interconnections. Additionally, our multi-stage gradient ascent (MGA) technique, an extension of gradient ascent (GA), enables sequential neuron activation to enhance fault detection. We evaluated NP-SBFL-MGA on the well-established MNIST and CIFAR-10 datasets, comparing it to other methods like DeepFault and NP-SBFL-GA, as well as three neuron measures: Tarantula, Ochiai, and Barinel. Our evaluation utilized all training and test samples (60,000 for MNIST and 50,000 for CIFAR-10) and revealed that NP-SBFL-MGA significantly outperformed the baselines in identifying suspicious pathways and generating adversarial inputs. Notably, Tarantula with NP-SBFL-MGA achieved a remarkable 96.75% fault detection rate compared to DeepFault's 89.90%. NP-SBFL-MGA highlights a strong correlation between critical path coverage and the number of failed tests in DNN fault localization.

1.8SEMar 26, 2024Code

Natural Language Requirements Testability Measurement Based on Requirement Smells

Morteza Zakeri-Nasrabadi, Saeed Parsa

Requirements form the basis for defining software systems' obligations and tasks. Testable requirements help prevent failures, reduce maintenance costs, and make it easier to perform acceptance tests. However, despite the importance of measuring and quantifying requirements testability, no automatic approach for measuring requirements testability has been proposed based on the requirements smells, which are at odds with the requirements testability. This paper presents a mathematical model to evaluate and rank the natural language requirements testability based on an extensive set of nine requirements smells, detected automatically, and acceptance test efforts determined by requirement length and its application domain. Most of the smells stem from uncountable adjectives, context-sensitive, and ambiguous words. A comprehensive dictionary is required to detect such words. We offer a neural word-embedding technique to generate such a dictionary automatically. Using the dictionary, we could automatically detect Polysemy smell (domain-specific ambiguity) for the first time in 10 application domains. Our empirical study on nearly 1000 software requirements from six well-known industrial and academic projects demonstrates that the proposed smell detection approach outperforms Smella, a state-of-the-art tool, in detecting requirements smells. The precision and recall of smell detection are improved with an average of 0.03 and 0.33, respectively, compared to the state-of-the-art. The proposed requirement testability model measures the testability of 985 requirements with a mean absolute error of 0.12 and a mean squared error of 0.03, demonstrating the model's potential for practical use.

4.9SEMar 25, 2018

Kernel-based Detection of Coincidentally Correct Test Cases to Improve Fault Localization Effectiveness

Farid Feyzi, Saeed Parsa

Although empirical studies have confirmed the effectiveness of spectrum-based fault localization (SBFL) techniques, their performance may be degraded due to presence of some undesired circumstances such as the existence of coincidental correctness (CC) where one or more passing test cases exercise a faulty statement and thus causing some confusion to decide whether the underlying exercised statement is faulty or not. This article aims at improving SBFL effectiveness by mitigating the effect of CC test cases. In this regard, a new method is proposed that uses a support vector machine (SVM) with a customized kernel function. To build the kernel function, we applied a new sequence-matching algorithm that measures the similarities between passing and failing executions. We conducted some experiments to assess the proposed method. The results show that our method can effectively improve the performance of SBFL techniques.

5.2SEDec 9, 2017

FPA-FL: Incorporating Static Fault-proneness Analysis into Statistical Fault Localization

Farid Feyzi, Saeed Parsa

Despite the proven applicability of the statistical methods in automatic fault localization, these approaches are biased by data collected from different executions of the program. This biasness could result in unstable statistical models which may vary dependent on test data provided for trial executions of the program. To resolve the difficulty, in this article a new fault-proneness-aware statistical approach based on Elastic-Net regression, namely FPA-FL is proposed. The main idea behind FPA-FL is to consider the static structure and the fault-proneness of the program statements in addition to their dynamic correlations with the program termination state. The grouping effect of FPA-FL is helpful for finding multiple faults and supporting scalability. To provide the context of failure, cause-effect chains of program faults are discovered. FPA-FL is evaluated from different viewpoints on well-known test suites. The results reveal high fault localization performance of our approach, compared with similar techniques in the literature.

2.9SEJul 9, 2017

Validation of Collaborative Business Processes using Goals Model

Amir Ebrahimifard, Mostafa Khoramabadi Arani, Mohammad Javad Amiri et al.

Validating process model against corresponding requirements is one of the most important problems in domain of collaborative processes. In this paper collaborative processes are modeled using the interaction view of BPMN 2.0 standard. Then, requirements are extracted with a goal modeling technique. Different scenarios of each requirement show possible paths for the system. These paths are modeled by sequence diagram and collaborative processes are validated according to the corresponding requirements using Savara tool.

7.9SEDec 17, 2016

FPA-Debug: Effective Statistical Fault Localization Considering Fault-proneness Analysis

Farid Feyzi, Esmaeel Nikravan, Saeed Parsa

The aim is to identify faulty predicates which have strong effect on program failure. Statistical debugging techniques are amongst best methods for pinpointing defects within the program source code. However, they have some drawbacks. They require a large number of executions to identify faults, they might be adversely affected by coincidental correctness, and they do not take into consideration fault-proneness associated with different parts of the program code while constructing behavioral models. Additionally, they do not consider the simultaneous impact of predicates on program termination status. To deal with mentioned problems, a new fault-proneness-aware approach based on elastic net regression, namely FPA-Debug has been proposed in this paper. FPA-Debug employs a clustering-based strategy to alleviate coincidental correctness in fault localization and finds the smallest effective subset of program predicates known as bug predictors. Moreover, the approach considers fault-proneness of code during statistical modeling through applying different regularization parameter to each program predicates depending on its location within program source code. The experimental results on well-known test suite, Siemens, reveal the effectiveness and accuracy of the FPA-Debug.

4.0SEJun 11, 2014

A new approach for formal behavioral modeling of protection services in antivirus systems

Monire Norouzi, Saeed Parsa, Ali Mahjur

Formal method techniques provides a suitable platform for the software development in software systems. Formal methods and formal verification is necessary to prove the correctness and improve performance of software systems in various levels of design and implementation, too. Security Discussion is an important issue in computer systems. Since the antivirus applications have very important role in computer systems security, verifying these applications is very essential and necessary. In this paper, we present four new approaches for antivirus system behavior and a behavioral model of protection services in the antivirus system is proposed. We divided the behavioral model in to preventive behavior and control behavior and then we formal these behaviors. Finally by using some definitions we explain the way these behaviors are mapped on each other by using our new approaches.