Mohammad Azzeh

h-index27

26papers

895citations

Novelty34%

AI Score23

Ranked #173,500 of 194,257 authors (top 89%)#2,136 in SE (top 70%)

26 Papers

2.3CYSep 12, 2022

Predicting students' learning styles using regression techniques

Ahmad Mousa Altamimi, Mohammad Azzeh, Mahmoud Albashayreh

Traditional learning systems have responded quickly to the COVID pandemic and moved to online or distance learning. Online learning requires a personalization method because the interaction between learners and instructors is minimal, and learners have a specific learning method that works best for them. One of the personalization methods is detecting the learners' learning style. To detect learning styles, several works have been proposed using classification techniques. However, the current detection models become ineffective when learners have no dominant style or a mix of learning styles. Thus, the objective of this study is twofold. Firstly, constructing a prediction model based on regression analysis provides a probabilistic approach for inferring the preferred learning style. Secondly, comparing regression models and classification models for detecting learning style. To ground our conceptual model, a set of machine learning algorithms have been implemented based on a dataset collected from a sample of 72 students using visual, auditory, reading/writing, and kinesthetic (VARK's) inventory questionnaire. Results show that regression techniques are more accurate and representative for real-world scenarios than classification algorithms, where students might have multiple learning styles but with different probabilities. We believe that this research will help educational institutes to engage learning styles in the teaching process.

1.8LGSep 10, 2022

Application of Machine Learning for Online Reputation Systems

Ahmad Alqwadri, Mohammad Azzeh, Fadi Almasalha

Users on the internet usually require venues to provide better purchasing recommendations. This can be provided by a reputation system that processes ratings to provide recommendations. The rating aggregation process is a main part of reputation system to produce global opinion about the product quality. Naive methods that are frequently used do not consider consumer profiles in its calculation and cannot discover unfair ratings and trends emerging in new ratings. Other sophisticated rating aggregation methods that use weighted average technique focus on one or a few aspects of consumers profile data. This paper proposes a new reputation system using machine learning to predict reliability of consumers from consumer profile. In particular, we construct a new consumer profile dataset by extracting a set of factors that have great impact on consumer reliability, which serve as an input to machine learning algorithms. The predicted weight is then integrated with a weighted average method to compute product reputation score. The proposed model has been evaluated over three MovieLens benchmarking datasets, using 10-Folds cross validation. Furthermore, the performance of the proposed model has been compared to previous published rating aggregation models. The obtained results were promising which suggest that the proposed approach could be a potential solution for reputation systems. The results of comparison demonstrated the accuracy of our models. Finally, the proposed approach can be integrated with online recommendation systems to provide better purchasing recommendations and facilitate user experience on online shopping markets.

3.3LGSep 10, 2022

Examining stability of machine learning methods for predicting dementia at early phases of the disease

Sinan Faouri, Mahmood AlBashayreh, Mohammad Azzeh

Dementia is a neuropsychiatric brain disorder that usually occurs when one or more brain cells stop working partially or at all. Diagnosis of this disorder in the early phases of the disease is a vital task to rescue patients lives from bad consequences and provide them with better healthcare. Machine learning methods have been proven to be accurate in predicting dementia in the early phases of the disease. The prediction of dementia depends heavily on the type of collected data which usually are gathered from Normalized Whole Brain Volume (nWBV) and Atlas Scaling Factor (ASF) which are normally measured and corrected from Magnetic Resonance Imaging (MRIs). Other biological features such as age and gender can also help in the diagnosis of dementia. Although many studies use machine learning for predicting dementia, we could not reach a conclusion on the stability of these methods for which one is more accurate under different experimental conditions. Therefore, this paper investigates the conclusion stability regarding the performance of machine learning algorithms for dementia prediction. To accomplish this, a large number of experiments were run using 7 machine learning algorithms and two feature reduction algorithms namely, Information Gain (IG) and Principal Component Analysis (PCA). To examine the stability of these algorithms, thresholds of feature selection were changed for the IG from 20% to 100% and the PCA dimension from 2 to 8. This has resulted in 7x9 + 7x7= 112 experiments. In each experiment, various classification evaluation data were recorded. The obtained results show that among seven algorithms the support vector machine and Naive Bayes are the most stable algorithms while changing the selection threshold. Also, it was found that using IG would seem more efficient than using PCA for predicting Dementia.

2.7IVSep 10, 2022

An Interactive Automation for Human Biliary Tree Diagnosis Using Computer Vision

Mohammad AL-Oudat, Saleh Alomari, Hazem Qattous et al.

The biliary tree is a network of tubes that connects the liver to the gallbladder, an organ right beneath it. The bile duct is the major tube in the biliary tree. The dilatation of a bile duct is a key indicator for more major problems in the human body, such as stones and tumors, which are frequently caused by the pancreas or the papilla of vater. The detection of bile duct dilatation can be challenging for beginner or untrained medical personnel in many circumstances. Even professionals are unable to detect bile duct dilatation with the naked eye. This research presents a unique vision-based model for biliary tree initial diagnosis. To segment the biliary tree from the Magnetic Resonance Image, the framework used different image processing approaches (MRI). After the image's region of interest was segmented, numerous calculations were performed on it to extract 10 features, including major and minor axes, bile duct area, biliary tree area, compactness, and some textural features (contrast, mean, variance and correlation). This study used a database of images from King Hussein Medical Center in Amman, Jordan, which included 200 MRI images, 100 normal cases, and 100 patients with dilated bile ducts. After the characteristics are extracted, various classifiers are used to determine the patients' condition in terms of their health (normal or dilated). The findings demonstrate that the extracted features perform well with all classifiers in terms of accuracy and area under the curve. This study is unique in that it uses an automated approach to segment the biliary tree from MRI images, as well as scientifically correlating retrieved features with biliary tree status that has never been done before in the literature.

3.3LGDec 29, 2021

Artificial Intelligence and Statistical Techniques in Short-Term Load Forecasting: A Review

Ali Bou Nassif, Bassel Soudan, Mohammad Azzeh et al.

Electrical utilities depend on short-term demand forecasting to proactively adjust production and distribution in anticipation of major variations. This systematic review analyzes 240 works published in scholarly journals between 2000 and 2019 that focus on applying Artificial Intelligence (AI), statistical, and hybrid models to short-term load forecasting (STLF). This work represents the most comprehensive review of works on this subject to date. A complete analysis of the literature is conducted to identify the most popular and accurate techniques as well as existing gaps. The findings show that although Artificial Neural Networks (ANN) continue to be the most commonly used standalone technique, researchers have been exceedingly opting for hybrid combinations of different techniques to leverage the combined advantages of individual methods. The review demonstrates that it is commonly possible with these hybrid combinations to achieve prediction accuracy exceeding 99%. The most successful duration for short-term forecasting has been identified as prediction for a duration of one day at an hourly interval. The review has identified a deficiency in access to datasets needed for training of the models. A significant gap has been identified in researching regions other than Asia, Europe, North America, and Australia.

3.6SEFeb 11, 2021

Empirical Analysis on Productivity Prediction and Locality for Use Case Points Method

Mohammad Azzeh, Ali Bou Nassif, Cuauhtemoc Lopez Martin

Use Case Points (UCP) method has been around for over two decades. Although, there was a substantial criticism concerning the algebraic construction and factors assessment of UCP, it remains an efficient early size estimation method. Predicting software effort from UCP is still an ever-present challenge. The earlier version of UCP method suggested using productivity as a cost driver, where fixed or a few pre-defined productivity ratios have been widely agreed. While this approach was successful when no enough historical data is available, it is no longer acceptable because software projects are different in terms of development aspects. Therefore, it is better to understand the relationship between productivity and other UCP variables. This paper examines the impact of data locality approaches on productivity and effort prediction from multiple UCP variables. The environmental factors are used as partitioning factors to produce local homogeneous data either based on their influential levels or using clustering algorithms. Different machine learning methods, including solo and ensemble methods, are used to construct productivity and effort prediction models based on the local data. The results demonstrate that the prediction models that are created based on local data surpass models that use entire data. Also, the results show that conforming the hypothetical assumption between productivity and environmental factors is not necessarily a requirement for success of locality.

5.3SEDec 13, 2020

Predicting Software Effort from Use Case Points: A Systematic Review

Mohammad Azzeh, Ali Bou Nassif, Imtinan Attili

Context: Predicting software project effort from Use Case Points (UCP) method is increasingly used among researchers and practitioners. However, unlike other effort estimation domains, this area of interest has not been systematically reviewed. Aims: There is a need for a systemic literature review to provide directions and supports for this research area of effort estimation. Specifically, the objective of this study is twofold: to classify UCP effort estimation papers based on four criteria: contribution type, research approach, dataset type and techniques used with UCP; and to analyze these papers from different views: estimation accuracy, favorable estimation context and impact of combined techniques on the accuracy of UCP. Method: We used the systematic literature review methodology proposed by Kitchenham and Charters. This includes searching for the most relevant papers, selecting quality papers, extracting data and drawing results. Result: The authors of UCP research paper, are generally not aware of previous published results and conclusions in the field of UCP effort estimation. There is a lack of UCP related publications in the top software engineering journals. This makes a conclusion that such papers are not useful for the community. Furthermore, most articles used small numbers of projects which cannot support generalizing the conclusion in most cases. Conclusions: There are multiple research directions for UCP method that have not been examined so far such as validating the algebraic construction of UCP based on industrial data. Also, there is a need for standard automated tools that govern the process of translating use case diagram into its corresponding UCP metrics. Although there is an increase interest among researchers to collect industrial data and build effort prediction models based on machine learning methods, the quality of data is still subject to debate

6.9SEFeb 10, 2019

Software Development Effort Estimation Using Regression Fuzzy Models

Ali Bou Nassif, Mohammad Azzeh, Ali Idri et al.

Software effort estimation plays a critical role in project management. Erroneous results may lead to overestimating or underestimating effort, which can have catastrophic consequences on project resources. Machine-learning techniques are increasingly popular in the field. Fuzzy logic models, in particular, are widely used to deal with imprecise and inaccurate data. The main goal of this research was to design and compare three different fuzzy logic models for predicting software estimation effort: Mamdani, Sugeno with constant output and Sugeno with linear output. To assist in the design of the fuzzy logic models, we conducted regression analysis, an approach we call regression fuzzy logic. State-of-the-art and unbiased performance evaluation criteria such as standardized accuracy, effect size and mean balanced relative error were used to evaluate the models, as well as statistical tests. Models were trained and tested using industrial projects from the International Software Benchmarking Standards Group (ISBSG) dataset. Results showed that data heteroscedasticity affected model performance. Fuzzy logic models were found to be very sensitive to outliers. We concluded that when regression analysis was used to design the model, the Sugeno fuzzy inference system with linear output outperformed the other models.

2.2LGDec 16, 2018

Ensemble of Learning Project Productivity in Software Effort Based on Use Case Points

Mohammad Azzeh, Ali Bou Nassif, Shadi Banitaan et al.

It is well recognized that the project productivity is a key driver in estimating software project effort from Use Case Point size metric at early software development stages. Although, there are few proposed models for predicting productivity, there is no consistent conclusion regarding which model is the superior. Therefore, instead of building a new productivity prediction model, this paper presents a new ensemble construction mechanism applied for software project productivity prediction. Ensemble is an effective technique when performance of base models is poor. We proposed a weighted mean method to aggregate predicted productivities based on average of errors produced by training model. The obtained results show that the using ensemble is a good alternative approach when accuracies of base models are not consistently accurate over different datasets, and when models behave diversely.

2.8SEDec 15, 2018

v-SVR Polynomial Kernel for Predicting the Defect Density in New Software Projects

Cuauhtemoc Lopez-Martin, Mohammad Azzeh, Ali Bou Nassif et al.

An important product measure to determine the effectiveness of software processes is the defect density (DD). In this study, we propose the application of support vector regression (SVR) to predict the DD of new software projects obtained from the International Software Benchmarking Standards Group (ISBSG) Release 2018 data set. Two types of SVR (e-SVR and v-SVR) were applied to train and test these projects. Each SVR used four types of kernels. The prediction accuracy of each SVR was compared to that of a statistical regression (i.e., a simple linear regression, SLR). Statistical significance test showed that v-SVR with polynomial kernel was better than that of SLR when new software projects were developed on mainframes and coded in programming languages of third generation

0.8LGNov 26, 2018

Machine Learning Classifications of Coronary Artery Disease

Ali Bou Nassif, Omar Mahdi, Qassim Nasir et al.

Coronary Artery Disease (CAD) is one of the leading causes of death worldwide, and so it is very important to correctly diagnose patients with the disease. For medical diagnosis, machine learning is a useful tool, however features and algorithms must be carefully selected to get accurate classification. To this effect, three feature selection methods have been used on 13 input features from the Cleveland dataset with 297 entries, and 7 were selected. The selected features were used to train three different classifiers, which are SVM, Naïve Bayes and KNN using 10-fold cross-validation. The resulting models evaluated using Accuracy, Recall, Specificity and Precision. It is found that the Naïve Bayes classifier performs the best on this dataset and features, outperforming or matching SVM and KNN in all the four evaluation parameters used and achieving an accuracy of 84%.

8.7SEMay 28, 2017

Analyzing the Relationship between Project Productivity and Environment Factors in the Use Case Points Method

Mohammad Azzeh, Ali Bou Nassif

Project productivity is a key factor for producing effort estimates from Use Case Points (UCP), especially when the historical dataset is absent. The first versions of UCP effort estimation models used a fixed number or very limited numbers of productivity ratios for all new projects. These approaches have not been well examined over a large number of projects so the validity of these studies was a matter for criticism. The newly available large software datasets allow us to perform further research on the usefulness of productivity for effort estimation of software development. Specifically, we studied the relationship between project productivity and UCP environmental factors, as they have a significant impact on the amount of productivity needed for a software project. Therefore, we designed four studies, using various classification and regression methods, to examine the usefulness of that relationship and its impact on UCP effort estimation. The results we obtained are encouraging and show potential improvement in effort estimation. Furthermore, the efficiency of that relationship is better over a dataset that comes from industry because of the quality of data collection. Our comment on the findings is that it is better to exclude environmental factors from calculating UCP and make them available only for computing productivity. The study also encourages project managers to understand how to better assess the environmental factors as they do have a significant impact on productivity

3.3CYApr 10, 2017

A Comparative Study for Predicting Heart Diseases Using Data Mining Classification Methods

Israa Ahmed Zriqat, Ahmad Mousa Altamimi, Mohammad Azzeh

Improving the precision of heart diseases detection has been investigated by many researchers in the literature. Such improvement induced by the overwhelming health care expenditures and erroneous diagnosis. As a result, various methodologies have been proposed to analyze the disease factors aiming to decrease the physicians practice variation and reduce medical costs and errors. In this paper, our main motivation is to develop an effective intelligent medical decision support system based on data mining techniques. In this context, five data mining classifying algorithms, with large datasets, have been utilized to assess and analyze the risk factors statistically related to heart diseases in order to compare the performance of the implemented classifiers (e.g., Naïve Bayes, Decision Tree, Discriminant, Random Forest, and Support Vector Machine). To underscore the practical viability of our approach, the selected classifiers have been implemented using MATLAB tool with two datasets. Results of the conducted experiments showed that all classification algorithms are predictive and can give relatively correct answer. However, the decision tree outperforms other classifiers with an accuracy rate of 99.0% followed by Random forest. That is the case because both of them have relatively same mechanism but the Random forest can build ensemble of decision tree. Although ensemble learning has been proved to produce superior results, but in our case the decision tree has outperformed its ensemble version.

2.9SEMar 13, 2017

Software stage-effort estimation based on association rule mining and fuzzy set theory

Mohammad Azzeh, Peter I Cowling, Daniel Neagu

Relaying on early effort estimation to predict the required number of resources is not often sufficient, and could lead to under or over estimation. It is widely acknowledge that that software development process should be refined regularly and that software prediction made at early stage of software development is yet kind of guesses. Even good predictions are not sufficient with inherent uncertainty and risks. The stage-effort estimation allows project manager to re-allocate correct number of resources, re-schedule project and control project progress to finish on time and within budget. In this paper we propose an approach to utilize prior effort records to predict stage effort. The proposed model combines concepts of Fuzzy set theory and association rule mining. The results were good in terms of prediction accuracy and have potential to deliver good stage-effort estimation.

7.1SEMar 13, 2017

Software effort estimation based on optimized model tree

Mohammad Azzeh

Background: It is widely recognized that software effort estimation is a regression problem. Model Tree (MT) is one of the Machine Learning based regression techniques that is useful for software effort estimation, but as other machine learning algorithms, the MT has a large space of configuration and requires to carefully setting its parameters. The choice of such parameters is a dataset dependent so no general guideline can govern this process which forms the motivation of this work. Aims: This study investigates the effect of using the most recent optimization algorithm called Bees algorithm to specify the optimal choice of MT parameters that fit a dataset and therefore improve prediction accuracy. Method: We used MT with optimal parameters identified by the Bees algorithm to construct software effort estimation model. The model has been validated over eight datasets come from two main sources: PROMISE and ISBSG. Also we used 3-Fold cross validation to empirically assess the prediction accuracies of different estimation models. As benchmark, results are also compared to those obtained with Stepwise Regression Case-Based Reasoning and Multi-Layer Perceptron. Results: The results obtained from combination of MT and Bees algorithm are encouraging and outperforms other well-known estimation methods applied on employed datasets. They are also interesting enough to suggest the effectiveness of MT among the techniques that are suitable for effort estimation. Conclusions: The use of the Bees algorithm enabled us to automatically find optimal MT parameters required to construct effort estimation models that fit each individual dataset. Also it provided a significant improvement on prediction accuracy.

2.9SEMar 11, 2017

Dataset Quality Assessment: An extension for analogy based effort estimation

Mohammad Azzeh

Estimation by Analogy (EBA) is an increasingly active research method in the area of software engineering. The fundamental assumption of this method is that the similar projects in terms of attribute values will also be similar in terms of effort values. It is well recognized that the quality of software datasets has a considerable impact on the reliability and accuracy of such method. Therefore, if the software dataset does not satisfy the aforementioned assumption then it is not rather useful for EBA method. This paper presents a new method based on Kendall's row-wise rank correlation that enables data quality evaluation and providing a data preprocessing stage for EBA. The proposed method provides sound statistical basis and justification for the process of data quality evaluation. Unlike Analogy-X, our method has the ability to deal with categorical attributes individually without the need for partitioning the dataset. Experimental results showed that the proposed method could form a useful extension for EBA as it enables: dataset quality evaluation, attribute selection and identifying abnormal observations.

2.9SEMar 11, 2017

An Optimized Analogy-Based Project Effort Estimation

Mohammad Azzeh, Yousef Elsheikh, Marwan Alseid

Despite the predictive performance of Analogy-Based Estimation (ABE) in generating better effort estimates, there is no consensus on how to predict the best number of analogies, and which adjustment technique produces better estimates. This paper proposes a new adjusted ABE model based on optimizing and approximating complex relationships between features and reflects that approximation on the final estimate. The results show that the predictive performance of ABE has noticeably been improved, and the number of analogies was remarkably variable for each test project.

12.5SEMar 11, 2017

Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics

Mohammad Azzeh, Ali Bou Nassif

Analogy-based effort estimation (ABE) is one of the efficient methods for software effort estimation because of its outstanding performance and capability of handling noisy datasets. Conventional ABE models usually use the same number of analogies for all projects in the datasets in order to make good estimates. The authors' claim is that using same number of analogies may produce overall best performance for the whole dataset but not necessarily best performance for each individual project. Therefore there is a need to better understand the dataset characteristics in order to discover the optimum set of analogies for each project rather than using a static k nearest projects. Method: We propose a new technique based on Bisecting k-medoids clustering algorithm to come up with the best set of analogies for each individual project before making the prediction. Results & Conclusions: With Bisecting k-medoids it is possible to better understand the dataset characteristic, and automatically find best set of analogies for each test project. Performance figures of the proposed estimation method are promising and better than those of other regular ABE models

8.7SEMar 11, 2017

Fuzzy Model Tree For Early Effort Estimation

Mohammad Azzeh, Ali Bou Nassif

Use Case Points (UCP) is a well-known method to estimate the project size, based on Use Case diagram, at early phases of software development. Although the Use Case diagram is widely accepted as a de-facto model for analyzing object oriented software requirements over the world, UCP method did not take sufficient amount of attention because, as yet, there is no consensus on how to produce software effort from UCP. This paper aims to study the potential of using Fuzzy Model Tree to derive effort estimates based on UCP size measure using a dataset collected for that purpose. The proposed approach has been validated against Treeboost model, Multiple Linear Regression and classical effort estimation based on the UCP model. The obtained results are promising and show better performance than those obtained by classical UCP, Multiple Linear Regression and slightly better than those obtained by Tree boost model.

10.1SEMar 11, 2017

Model tree based adaption strategy for software effort estimation by analogy

Mohammad Azzeh

Background: Adaptation technique is a crucial task for analogy based estimation. Current adaptation techniques often use linear size or linear similarity adjustment mechanisms which are often not suitable for datasets that have complex structure with many categorical attributes. Furthermore, the use of nonlinear adaptation technique such as neural network and genetic algorithms needs many user interactions and parameters optimization for configuring them (such as network model, number of neurons, activation functions, training functions, mutation, selection, crossover, ... etc.). Aims: In response to the abovementioned challenges, the present paper proposes a new adaptation strategy using Model Tree based attribute distance to adjust estimation by analogy and derive new estimates. Using Model Tree has an advantage to deal with categorical attributes, minimize user interaction and improve efficiency of model learning through classification. Method: Seven well known datasets have been used with 3-Fold cross validation to empirically validate the proposed approach. The proposed method has been investigated using various K analogies from 1 to 3. Results: Experimental results showed that the proposed approach produced better results when compared with those obtained by using estimation by analogy based linear size adaptation, linear similarity adaptation, 'regression towards the mean' and null adaptation. Conclusions: Model Tree could form a useful extension for estimation by analogy especially for complex data sets with large number of categorical attributes.

11.4SEMar 11, 2017

Learning best K analogies from data distribution for case-based software effort estimation

Mohammad Azzeh, Yousef Elsheikh

Case-Based Reasoning (CBR) has been widely used to generate good software effort estimates. The predictive performance of CBR is a dataset dependent and subject to extremely large space of configuration possibilities. Regardless of the type of adaptation technique, deciding on the optimal number of similar cases to be used before applying CBR is a key challenge. In this paper we propose a new technique based on Bisecting k-medoids clustering algorithm to better understanding the structure of a dataset and discovering the the optimal cases for each individual project by excluding irrelevant cases. Results obtained showed that understanding of the data characteristic prior prediction stage can help in automatically finding the best number of cases for each test project. Performance figures of the proposed estimation method are better than those of other regular K-based CBR methods.

13.5SEMar 11, 2017

An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation

Mohammad Azzeh, Ali Bou Nassif, Leandro L Minku

Objective: This paper investigates the potential of ensemble learning for variants of adjustment methods used in analogy-based effort estimation. The number k of analogies to be used is also investigated. Method We perform a large scale comparison study where many ensembles constructed from n out of 40 possible valid variants of adjustment methods are applied to eight datasets. The performance of each method was evaluated based on standardized accuracy and effect size. Results: The results have been subjected to statistical significance testing, and show reasonable significant improvements on the predictive performance where ensemble methods are applied. Conclusion: Our conclusions suggest that ensembles of adjustment methods can work well and achieve good performance, even though they are not always superior to single methods. We also recommend constructing ensembles from only linear adjustment methods, as they have shown better performance and were frequently ranked higher.

18.2SENov 29, 2016

Neural Network Models for Software Development Effort Estimation: A Comparative Study

Ali Bou Nassif, Mohammad Azzeh, Luiz Fernando Capretz et al.

Software development effort estimation (SDEE) is one of the main tasks in software project management. It is crucial for a project manager to efficiently predict the effort or cost of a software project in a bidding process, since overestimation will lead to bidding loss and underestimation will cause the company to lose money. Several SDEE models exist; machine learning models, especially neural network models, are among the most prominent in the field. In this study, four different neural network models: Multilayer Perceptron, General Regression Neural Network, Radial Basis Function Neural Network, and Cascade Correlation Neural Network are compared with each other based on: (1) predictive accuracy centered on the Mean Absolute Error criterion, (2) whether such a model tends to overestimate or underestimate, and (3) how each model classifies the importance of its inputs. Industrial datasets from the International Software Benchmarking Standards Group (ISBSG) are used to train and validate the four models. The main ISBSG dataset was filtered and then divided into five datasets based on the productivity value of each project. Results show that the four models tend to overestimate in 80percent of the datasets, and the significance of the model inputs varies based on the selected model. Furthermore, the Cascade Correlation Neural Network outperforms the other three models in the majority of the datasets constructed on the Mean Absolute Residual criterion.

10.1SENov 29, 2016

Pareto Efficient Multi Objective Optimization for Local Tuning of Analogy Based Estimation

Mohammad Azzeh, Ali Bou Nassif, Shadi Banitaan et al.

Analogy Based Effort Estimation (ABE) is one of the prominent methods for software effort estimation. The fundamental concept of ABE is closer to the mentality of expert estimation but with an automated procedure in which the final estimate is generated by reusing similar historical projects. The main key issue when using ABE is how to adapt the effort of the retrieved nearest neighbors. The adaptation process is an essential part of ABE to generate more successful accurate estimation based on tuning the selected raw solutions, using some adaptation strategy. In this study we show that there are three interrelated decision variables that have great impact on the success of adaptation method: (1) number of nearest analogies (k), (2) optimum feature set needed for adaptation, and (3) adaptation weights. To find the right decision regarding these variables, one need to study all possible combinations and evaluate them individually to select the one that can improve all prediction evaluation measures. The existing evaluation measures usually behave differently, presenting sometimes opposite trends in evaluating prediction methods. This means that changing one decision variable could improve one evaluation measure while it is decreasing the others. Therefore, the main theme of this research is how to come up with best decision variables that improve adaptation strategy and thus, the overall evaluation measures without degrading the others. The impact of these decisions together has not been investigated before, therefore we propose to view the building of adaptation procedure as a multi-objective optimization problem. The Particle Swarm Optimization Algorithm (PSO) is utilized to find the optimum solutions for such decision variables based on optimizing multiple evaluation measures

15.7SEOct 8, 2016

A Hybrid Model for Estimating Software Project Effort from Use Case Points

Mohammad Azzeh, Ali Bou Nassif

Early software effort estimation is a hallmark of successful software project management. Building a reliable effort estimation model usually requires historical data. Unfortunately, since the information available at early stages of software development is scarce, it is recommended to use software size metrics as key cost factor of effort estimation. Use Case Points (UCP) is a prominent size measure designed mainly for object-oriented projects. Nevertheless, there are no established models that can translate UCP into its corresponding effort, therefore, most models use productivity as a second cost driver. The productivity in those models is usually guessed by experts and does not depend on historical data, which makes it subject to uncertainty. Thus, these models were not well examined using a large number of historical data. In this paper, we designed a hybrid model that consists of classification and prediction stages using a support vector machine and radial basis neural networks. The proposed model was constructed over a large number of observations collected from industrial and student projects. The proposed model was compared against previous UCP prediction models. The validation and empirical results demonstrated that the proposed model significantly surpasses these models on all datasets. The main conclusion is that the environmental factors of UCP can be used to classify and estimate productivity.

8.8SEAug 28, 2015

A Comparison Between Decision Trees and Decision Tree Forest Models for Software Development Effort Estimation

Ali Bou Nassif, Mohammad Azzeh, Luiz Fernando Capretz et al.

Accurate software effort estimation has been a challenge for many software practitioners and project managers. Underestimation leads to disruption in the projects estimated cost and delivery. On the other hand, overestimation causes outbidding and financial losses in business. Many software estimation models exist; however, none have been proven to be the best in all situations. In this paper, a decision tree forest (DTF) model is compared to a traditional decision tree (DT) model, as well as a multiple linear regression model (MLR). The evaluation was conducted using ISBSG and Desharnais industrial datasets. Results show that the DTF model is competitive and can be used as an alternative in software effort prediction.