Ali Bou Nassif

SE
45papers
2,226citations
Novelty28%
AI Score23

45 Papers

IVMar 8, 2022
Breast cancer detection using artificial intelligence techniques: A systematic literature review

Ali Bou Nassif, Manar Abu Talib, Qassim Nasir et al.

Cancer is one of the most dangerous diseases to humans, and yet no permanent cure has been developed for it. Breast cancer is one of the most common cancer types. According to the National Breast Cancer foundation, in 2020 alone, more than 276,000 new cases of invasive breast cancer and more than 48,000 non-invasive cases were diagnosed in the US. To put these figures in perspective, 64% of these cases are diagnosed early in the disease's cycle, giving patients a 99% chance of survival. Artificial intelligence and machine learning have been used effectively in detection and treatment of several dangerous diseases, helping in early diagnosis and treatment, and thus increasing the patient's chance of survival. Deep learning has been designed to analyze the most important features affecting detection and treatment of serious diseases. For example, breast cancer can be detected using genes or histopathological imaging. Analysis at the genetic level is very expensive, so histopathological imaging is the most common approach used to detect breast cancer. In this research work, we systematically reviewed previous work done on detection and treatment of breast cancer using genetic sequencing or histopathological imaging with the help of deep learning and machine learning. We also provide recommendations to researchers who will work in this field

CLMay 6, 2022
Arabic Fake News Detection Based on Deep Contextualized Embedding Models

Ali Bou Nassif, Ashraf Elnagar, Omar Elgendy et al.

Social media is becoming a source of news for many people due to its ease and freedom of use. As a result, fake news has been spreading quickly and easily regardless of its credibility, especially in the last decade. Fake news publishers take advantage of critical situations such as the Covid-19 pandemic and the American presidential elections to affect societies negatively. Fake news can seriously impact society in many fields including politics, finance, sports, etc. Many studies have been conducted to help detect fake news in English, but research conducted on fake news detection in the Arabic language is scarce. Our contribution is twofold: first, we have constructed a large and diverse Arabic fake news dataset. Second, we have developed and evaluated transformer-based classifiers to identify fake news while utilizing eight state-of-the-art Arabic contextualized embedding models. The majority of these models had not been previously used for Arabic fake news detection. We conduct a thorough analysis of the state-of-the-art Arabic contextualized embedding models as well as comparison with similar fake news detection systems. Experimental results confirm that these state-of-the-art models are robust, with accuracy exceeding 98%.

NEJan 16, 2023
Optimization Algorithms in Smart Grids: A Systematic Literature Review

Sidra Aslam, Ala Altaweel, Ali Bou Nassif

Electrical smart grids are units that supply electricity from power plants to the users to yield reduced costs, power failures/loss, and maximized energy management. Smart grids (SGs) are well-known devices due to their exceptional benefits such as bi-directional communication, stability, detection of power failures, and inter-connectivity with appliances for monitoring purposes. SGs are the outcome of different modern applications that are used for managing data and security, i.e., modeling, monitoring, optimization, and/or Artificial Intelligence. Hence, the importance of SGs as a research field is increasing with every passing year. This paper focuses on novel features and applications of smart grids in domestic and industrial sectors. Specifically, we focused on Genetic algorithm, Particle Swarm Optimization, and Grey Wolf Optimization to study the efforts made up till date for maximized energy management and cost minimization in SGs. Therefore, we collected 145 research works (2011 to 2022) in this systematic literature review. This research work aims to figure out different features and applications of SGs proposed in the last decade and investigate the trends in popularity of SGs for different regions of world. Our finding is that the most popular optimization algorithm being used by researchers to bring forward new solutions for energy management and cost effectiveness in SGs is Particle Swarm Optimization. We also provide a brief overview of objective functions and parameters used in the solutions for energy and cost effectiveness as well as discuss different open research challenges for future research works.

IVApr 23, 2022
Transformation Invariant Cancerous Tissue Classification Using Spatially Transformed DenseNet

Omar Mahdi, Ali Bou Nassif

In this work, we introduce a spatially transformed DenseNet architecture for transformation invariant classification of cancer tissue. Our architecture increases the accuracy of the base DenseNet architecture while adding the ability to operate in a transformation invariant way while simultaneously being simpler than other models that try to provide some form of invariance.

SDJan 9, 2022
Emotional Speaker Identification using a Novel Capsule Nets Model

Ali Bou Nassif, Ismail Shahin, Ashraf Elnagar et al.

Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between low-level features with their pooling technique, this study investigates the performance of using CapsNets in identifying speakers from emotional speech recordings. A CapsNet-based speaker identification model is proposed and evaluated using three distinct speech databases, i.e., the Emirati Speech Database, SUSAS Dataset, and RAVDESS (open-access). The proposed model is also compared to baseline systems. Experimental results demonstrate that the novel proposed CapsNet model trains faster and provides better results over current state-of-the-art schemes. The effect of the routing algorithm on speaker identification performance was also studied by varying the number of iterations, both with and without a decoder network.

LGDec 29, 2021
Artificial Intelligence and Statistical Techniques in Short-Term Load Forecasting: A Review

Ali Bou Nassif, Bassel Soudan, Mohammad Azzeh et al.

Electrical utilities depend on short-term demand forecasting to proactively adjust production and distribution in anticipation of major variations. This systematic review analyzes 240 works published in scholarly journals between 2000 and 2019 that focus on applying Artificial Intelligence (AI), statistical, and hybrid models to short-term load forecasting (STLF). This work represents the most comprehensive review of works on this subject to date. A complete analysis of the literature is conducted to identify the most popular and accurate techniques as well as existing gaps. The findings show that although Artificial Neural Networks (ANN) continue to be the most commonly used standalone technique, researchers have been exceedingly opting for hybrid combinations of different techniques to leverage the combined advantages of individual methods. The review demonstrates that it is commonly possible with these hybrid combinations to achieve prediction accuracy exceeding 99%. The most successful duration for short-term forecasting has been identified as prediction for a duration of one day at an hourly interval. The review has identified a deficiency in access to datasets needed for training of the models. A significant gap has been identified in researching regions other than Asia, Europe, North America, and Australia.

SDDec 26, 2021
Novel Hybrid DNN Approaches for Speaker Verification in Emotional and Stressful Talking Environments

Ismail Shahin, Ali Bou Nassif, Nawel Nemmour et al.

In this work, we conducted an empirical comparative study of the performance of text-independent speaker verification in emotional and stressful environments. This work combined deep models with shallow architecture, which resulted in novel hybrid classifiers. Four distinct hybrid models were utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural network-Gaussian mixture model (DNN-GMM), Gaussian mixture model-deep neural network (GMM-DNN), and hidden Markov model-deep neural network (HMM-DNN). All models were based on novel implemented architecture. The comparative study used three distinct speech datasets: a private Arabic dataset and two public English databases, namely, Speech Under Simulated and Actual Stress (SUSAS) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The test results of the aforementioned hybrid models demonstrated that the proposed HMM-DNN leveraged the verification performance in emotional and stressful environments. Results also showed that HMM-DNN outperformed all other hybrid models in terms of equal error rate (EER) and area under the curve (AUC) evaluation metrics. The average resulting verification system based on the three datasets yielded EERs of 7.19%, 16.85%, 11.51%, and 11.90% based on HMM-DNN, DNN-HMM, DNN-GMM, and GMM-DNN, respectively. Furthermore, we found that the DNN-GMM model demonstrated the least computational complexity compared to all other hybrid models in both talking environments. Conversely, the HMM-DNN model required the greatest amount of training time. Findings also demonstrated that EER and AUC values depended on the database when comparing average emotional and stressful performances.

SDDec 26, 2021
Novel Dual-Channel Long Short-Term Memory Compressed Capsule Networks for Emotion Recognition

Ismail Shahin, Noor Hindawi, Ali Bou Nassif et al.

Recent analysis on speech emotion recognition has made considerable advances with the use of MFCCs spectrogram features and the implementation of neural network approaches such as convolutional neural networks (CNNs). Capsule networks (CapsNet) have gained gratitude as alternatives to CNNs with their larger capacities for hierarchical representation. To address these issues, this research introduces a text-independent and speaker-independent SER novel architecture, where a dual-channel long short-term memory compressed-CapsNet (DC-LSTM COMP-CapsNet) algorithm is proposed based on the structural features of CapsNet. Our proposed novel classifier can ensure the energy efficiency of the model and adequate compression method in speech emotion recognition, which is not delivered through the original structure of a CapsNet. Moreover, the grid search approach is used to attain optimal solutions. Results witnessed an improved performance and reduction in the training and testing running time. The speech datasets used to evaluate our algorithm are: Arabic Emirati-accented corpus, English speech under simulated and actual stress corpus, English Ryerson audio-visual database of emotional speech and song corpus, and crowd-sourced emotional multimodal actors dataset. This work reveals that the optimum feature extraction method compared to other known methods is MFCCs delta-delta. Using the four datasets and the MFCCs delta-delta, DC-LSTM COMP-CapsNet surpasses all the state-of-the-art systems, classical classifiers, CNN, and the original CapsNet. Using the Arabic Emirati-accented corpus, our results demonstrate that the proposed work yields average emotion recognition accuracy of 89.3% compared to 84.7%, 82.2%, 69.8%, 69.2%, 53.8%, 42.6%, and 31.9% based on CapsNet, CNN, support vector machine, multi-layer perceptron, k-nearest neighbor, radial basis function, and naive Bayes, respectively.

LGDec 15, 2021
COVID-19 Electrocardiograms Classification using CNN Models

Ismail Shahin, Ali Bou Nassif, Mohamed Bader Alsabek

With the periodic rise and fall of COVID-19 and numerous countries being affected by its ramifications, there has been a tremendous amount of work that has been done by scientists, researchers, and doctors all over the world. Prompt intervention is keenly needed to tackle the unconscionable dissemination of the disease. The implementation of Artificial Intelligence (AI) has made a significant contribution to the digital health district by applying the fundamentals of deep learning algorithms. In this study, a novel approach is proposed to automatically diagnose the COVID-19 by the utilization of Electrocardiogram (ECG) data with the integration of deep learning algorithms, specifically the Convolutional Neural Network (CNN) models. Several CNN models have been utilized in this proposed framework, including VGG16, VGG19, InceptionResnetv2, InceptionV3, Resnet50, and Densenet201. The VGG16 model has outperformed the rest of the models, with an accuracy of 85.92%. Our results show a relatively low accuracy in the rest of the models compared to the VGG16 model, which is due to the small size of the utilized dataset, in addition to the exclusive utilization of the Grid search hyperparameters optimization approach for the VGG16 model only. Moreover, our results are preparatory, and there is a possibility to enhance the accuracy of all models by further expanding the dataset and adapting a suitable hyperparameters optimization technique.

SDDec 15, 2021
The exploitation of Multiple Feature Extraction Techniques for Speaker Identification in Emotional States under Disguised Voices

Noor Ahmad Al Hindawi, Ismail Shahin, Ali Bou Nassif

Due to improvements in artificial intelligence, speaker identification (SI) technologies have brought a great direction and are now widely used in a variety of sectors. One of the most important components of SI is feature extraction, which has a substantial impact on the SI process and performance. As a result, numerous feature extraction strategies are thoroughly investigated, contrasted, and analyzed. This article exploits five distinct feature extraction methods for speaker identification in disguised voices under emotional environments. To evaluate this work significantly, three effects are used: high-pitched, low-pitched, and Electronic Voice Conversion (EVC). Experimental results reported that the concatenated Mel-Frequency Cepstral Coefficients (MFCCs), MFCCs-delta, and MFCCs-delta-delta is the best feature extraction method.

CLDec 1, 2021
Empirical evaluation of shallow and deep learning classifiers for Arabic sentiment analysis

Ali Bou Nassif, Abdollah Masoud Darya, Ashraf Elnagar

This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks (CNN), long short-term memory (LSTM), gated recurrent units (GRU), their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.

SENov 12, 2021
Reliability Models for Smartphone Applications

Sonia Meskini, Ali Bou Nassif, Luiz Fernando Capretz

Smartphones have become the most used electronic devices. They carry out most of the functionalities of desktops, offering various useful applications that suit the users needs. Therefore, instead of the operator, the user has been the main controller of the device and its applications, therefore its reliability has become an emergent requirement. As a first step, based on collected smartphone applications failure data, we investigated and evaluated the efficacy of Software Reliability Growth Models (SRGMs) when applied to these smartphone data in order to check whether they achieve the same accuracy as in the desktop/laptop area. None of the selected models were able to account for the smartphone data satisfactorily. Their failure is traced back to: (i) the hardware and software differences between desktops and smartphones, (ii) the specific features of mobile applications compared to desktop applications, and (iii) the different operational conditions and usage profiles. Thus, a reliability model suited to smartphone applications is still needed. In the second step, we applied the Weibull and Gamma distributions, and their two particular cases, Rayleigh and S-Shaped, to model the smartphone failure data sorted by application version number and grouped into different time periods. An estimation of the expected number of defects in each application version was obtained. The performances of the distributions were then compared amongst each other. We found that both Weibull and Gamma distributions can fit the failure data of mobile applications, although the Gamma distribution is frequently more suited.

SEFeb 11, 2021
Empirical Analysis on Productivity Prediction and Locality for Use Case Points Method

Mohammad Azzeh, Ali Bou Nassif, Cuauhtemoc Lopez Martin

Use Case Points (UCP) method has been around for over two decades. Although, there was a substantial criticism concerning the algebraic construction and factors assessment of UCP, it remains an efficient early size estimation method. Predicting software effort from UCP is still an ever-present challenge. The earlier version of UCP method suggested using productivity as a cost driver, where fixed or a few pre-defined productivity ratios have been widely agreed. While this approach was successful when no enough historical data is available, it is no longer acceptable because software projects are different in terms of development aspects. Therefore, it is better to understand the relationship between productivity and other UCP variables. This paper examines the impact of data locality approaches on productivity and effort prediction from multiple UCP variables. The environmental factors are used as partitioning factors to produce local homogeneous data either based on their influential levels or using clustering algorithms. Different machine learning methods, including solo and ensemble methods, are used to construct productivity and effort prediction models based on the local data. The results demonstrate that the prediction models that are created based on local data surpass models that use entire data. Also, the results show that conforming the hypothetical assumption between productivity and environmental factors is not necessarily a requirement for success of locality.

SDFeb 11, 2021
CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions

Ali Bou Nassif, Ismail Shahin, Shibani Hamsa et al.

This work aims at intensifying text-independent speaker identification performance in real application situations such as noisy and emotional talking conditions. This is achieved by incorporating two different modules: a Computational Auditory Scene Analysis CASA based pre-processing module for noise reduction and cascaded Gaussian Mixture Model Convolutional Neural Network GMM-CNN classifier for speaker identification followed by emotion recognition. This research proposes and evaluates a novel algorithm to improve the accuracy of speaker identification in emotional and highly-noise susceptible conditions. Experiments demonstrate that the proposed model yields promising results in comparison with other classifiers when Speech Under Simulated and Actual Stress SUSAS database, Emirati Speech Database ESD, the Ryerson Audio-Visual Database of Emotional Speech and Song RAVDESS database and the Fluent Speech Commands database are used in a noisy environment.

LGJan 11, 2021
Machine Learning Towards Intelligent Systems: Applications, Challenges, and Opportunities

MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif et al.

The emergence and continued reliance on the Internet and related technologies has resulted in the generation of large amounts of data that can be made available for analyses. However, humans do not possess the cognitive capabilities to understand such large amounts of data. Machine learning (ML) provides a mechanism for humans to process large amounts of data, gain insights about the behavior of the data, and make more informed decision based on the resulting analysis. ML has applications in various fields. This review focuses on some of the fields and applications such as education, healthcare, network security, banking and finance, and social media. Within these fields, there are multiple unique challenges that exist. However, ML can provide solutions to these challenges, as well as create further research opportunities. Accordingly, this work surveys some of the challenges facing the aforementioned fields and presents some of the previous literature works that tackled them. Moreover, it suggests several research opportunities that benefit from the use of ML to address these challenges.

SEDec 13, 2020
Predicting Software Effort from Use Case Points: A Systematic Review

Mohammad Azzeh, Ali Bou Nassif, Imtinan Attili

Context: Predicting software project effort from Use Case Points (UCP) method is increasingly used among researchers and practitioners. However, unlike other effort estimation domains, this area of interest has not been systematically reviewed. Aims: There is a need for a systemic literature review to provide directions and supports for this research area of effort estimation. Specifically, the objective of this study is twofold: to classify UCP effort estimation papers based on four criteria: contribution type, research approach, dataset type and techniques used with UCP; and to analyze these papers from different views: estimation accuracy, favorable estimation context and impact of combined techniques on the accuracy of UCP. Method: We used the systematic literature review methodology proposed by Kitchenham and Charters. This includes searching for the most relevant papers, selecting quality papers, extracting data and drawing results. Result: The authors of UCP research paper, are generally not aware of previous published results and conclusions in the field of UCP effort estimation. There is a lack of UCP related publications in the top software engineering journals. This makes a conclusion that such papers are not useful for the community. Furthermore, most articles used small numbers of projects which cannot support generalizing the conclusion in most cases. Conclusions: There are multiple research directions for UCP method that have not been examined so far such as validating the algebraic construction of UCP based on industrial data. Also, there is a need for standard automated tools that govern the process of translating use case diagram into its corresponding UCP metrics. Although there is an increase interest among researchers to collect industrial data and build effort prediction models based on machine learning methods, the quality of data is still subject to debate

CRAug 9, 2020
Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection

MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif et al.

Cyber-security garnered significant attention due to the increased dependency of individuals and organizations on the Internet and their concern about the security and privacy of their online activities. Several previous machine learning (ML)-based network intrusion detection systems (NIDSs) have been developed to protect against malicious online behavior. This paper proposes a novel multi-stage optimized ML-based NIDS framework that reduces computational complexity while maintaining its detection performance. This work studies the impact of oversampling techniques on the models' training sample size and determines the minimal suitable training sample size. Furthermore, it compares between two feature selection techniques, information gain and correlation-based, and explores their effect on detection performance and time complexity. Moreover, different ML hyper-parameter optimization techniques are investigated to enhance the NIDS's performance. The performance of the proposed framework is evaluated using two recent intrusion detection datasets, the CICIDS 2017 and the UNSW-NB 2015 datasets. Experimental results show that the proposed model significantly reduces the required training sample size (up to 74%) and feature set size (up to 50%). Moreover, the model performance is enhanced with hyper-parameter optimization with detection accuracies over 99% for both datasets, outperforming recent literature works by 1-2% higher accuracy and 1-2% lower false alarm rate.

LGAug 5, 2020
Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection

MohammadNoor Injadat, Fadi Salo, Ali Bou Nassif et al.

Network attacks have been very prevalent as their rate is growing tremendously. Both organization and individuals are now concerned about their confidentiality, integrity and availability of their critical information which are often impacted by network attacks. To that end, several previous machine learning-based intrusion detection methods have been developed to secure network infrastructure from such attacks. In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique to tune the parameters of Support Vector Machine with Gaussian Kernel (SVM-RBF), Random Forest (RF), and k-Nearest Neighbor (k-NN) algorithms. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.

CYJun 9, 2020
Multi-split Optimized Bagging Ensemble Model Selection for Multi-class Educational Data Mining

MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif et al.

Predicting students' academic performance has been a research area of interest in recent years with many institutions focusing on improving the students' performance and the education quality. The analysis and prediction of students' performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the students' final marks. To that end, this work analyzes two different undergraduate datasets at two different universities. Furthermore, this work aims to predict the students' performance at two stages of course delivery (20% and 50% respectively). This analysis allows for properly choosing the appropriate machine learning algorithms to use as well as optimize the algorithms' parameters. Furthermore, this work adopts a systematic multi-split approach based on Gini index and p-value. This is done by optimizing a suitable bagging ensemble learner that is built from any combination of six potential base machine learning algorithms. It is shown through experimental results that the posited bagging ensemble models achieve high accuracy for the target group for both datasets.

CRMay 23, 2020
Data Mining with Big Data in Intrusion Detection Systems: A Systematic Literature Review

Fadi Salo, MohammadNoor Injadat, Ali Bou Nassif et al.

Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applications, resources and platforms. In turn, the rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance. In this paper, we conduct a systematic literature review (SLR) of data mining techniques (DMT) used in IDS-based solutions through the period 2013-2018. We employed criterion-based, purposive sampling identifying 32 articles, which constitute the primary source of the present survey. After a careful investigation of these articles, we identified 17 separate DMTs deployed in an IDS context. This paper also presents the merits and disadvantages of the various works of current research that implemented DMTs and distributed streaming frameworks (DSF) to detect and/or prevent malicious attacks in a big data environment.

CYMay 13, 2020
Systematic Ensemble Model Selection Approach for Educational Data Mining

MohammadNoor Injadat, Abdallah Moubayed, Ali Bou Nassif et al.

A plethora of research has been done in the past focusing on predicting student's performance in order to support their development. Many institutions are focused on improving the performance and the education quality; and this can be achieved by utilizing data mining techniques to analyze and predict students' performance and to determine possible factors that may affect their final marks. To address this issue, this work starts by thoroughly exploring and analyzing two different datasets at two separate stages of course delivery (20 percent and 50 percent respectively) using multiple graphical, statistical, and quantitative techniques. The feature analysis provides insights into the nature of the different features considered and helps in the choice of the machine learning algorithms and their parameters. Furthermore, this work proposes a systematic approach based on Gini index and p-value to select a suitable ensemble learner from a combination of six potential machine learning algorithms. Experimental results show that the proposed ensemble models achieve high accuracy and low false positive rate at all stages for both datasets.

SEMar 22, 2020
Software Effort Estimation from Use Case Diagrams Using Nonlinear Regression Analysis

Ali Bou Nassif, Manar AbuTaleb, Luiz Fernando Capretz

Software effort estimation in the early stages of the software life cycle is one of the most essential and daunting tasks for project managers. In this research, a new model based on non-linear regression analysis is proposed to predict software effort from use case diagrams. It is concluded that, where software size is classified from small to very large, one linear or non-linear equation for effort estimation cannot be applied. Our model with three different non-linear regression equations can incorporate the different ranges in software size.

HCJan 17, 2020
EEG Wheelchair for People of Determination

Mariam AlAbboudi, Maitha Majed, Fatima Hassan et al.

The aim of this paper is to design and construct an electroencephalograph (EEG) based brain-controlled wheelchair to provide a communication bridge from the nervous system to the external technical device for people of determination or individuals suffering from partial or complete paralysis. EEG is a technique that reads the activity of the brain by capturing brain signals non-invasively using a special EEG headset. The signals acquired go through pre-processing, feature extraction and classification. This technique allows human thoughts alone to be converted to control the wheelchair. The commands used are moving to the right, left, forward, and backward and stop. The brain signals are acquired using the Emotiv Epoc headset. Discrete Wavelet Transform is used for feature extraction and Support Vector Machine (SVM) is used for classification.

SEOct 6, 2019
Can we rely on smartphone applications?

Sonia Meskini, Ali Bou Nassif, Luiz Fernando Capretz

Smartphones are becoming necessary tools in the daily lives of mil-lions of users who rely on these devices and their applications. There are thou-sands of applications for smartphone devices such as the iPhone, Blackberry, and Android, thus their reliability has become paramount for their users. This work aims to answer two related questions: (1) Can we assess the reliability of mobile applications by using the traditional reliability models? (2) Can we model adequately the failure data collected from many users? Firstly, it has been proved that the three most used software reliability models have fallen short of the mark when applied to smartphone applications; their failures were traced back to specific features of mobile applications. Secondly, it has been demonstrated that the Weibull and Gamma distribution models can adequately fit the observed failure data, thus providing better means to predict the reliability of smartphone applications.

SDSep 29, 2019
Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model

Ismail Shahin, Ali Bou Nassif

Speaker verification accuracy in emotional talking environments is not high as it is in neutral ones. This work aims at accepting or rejecting the claimed speaker using his/her voice in emotional environments based on the Third-Order Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. An Emirati-accented (Arabic) speech database with Mel-Frequency Cepstral Coefficients as the extracted features has been used to evaluate our work. Our results demonstrate that speaker verification accuracy based on CSPHMM3 is greater than that based on the state-of-the-art classifiers and models such as Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ).

SDSep 28, 2019
Emirati-Accented Speaker Identification in Stressful Talking Conditions

Ismail Shahin, Ali Bou Nassif

This research is dedicated to improving text-independent Emirati-accented speaker identification performance in stressful talking conditions using three distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The database that has been used in this work was collected from 25 per gender Emirati native speakers uttering eight widespread Emirati sentences in each of neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted features of the captured database are called Mel-Frequency Cepstral Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average Emirati-accented speaker identification accuracy in stressful conditions is 58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker identification accuracy in stressful conditions based on HMM3s is so similar to that attained in subjective assessment by human listeners.

SEFeb 10, 2019
Software Development Effort Estimation Using Regression Fuzzy Models

Ali Bou Nassif, Mohammad Azzeh, Ali Idri et al.

Software effort estimation plays a critical role in project management. Erroneous results may lead to overestimating or underestimating effort, which can have catastrophic consequences on project resources. Machine-learning techniques are increasingly popular in the field. Fuzzy logic models, in particular, are widely used to deal with imprecise and inaccurate data. The main goal of this research was to design and compare three different fuzzy logic models for predicting software estimation effort: Mamdani, Sugeno with constant output and Sugeno with linear output. To assist in the design of the fuzzy logic models, we conducted regression analysis, an approach we call regression fuzzy logic. State-of-the-art and unbiased performance evaluation criteria such as standardized accuracy, effect size and mean balanced relative error were used to evaluate the models, as well as statistical tests. Models were trained and tested using industrial projects from the International Software Benchmarking Standards Group (ISBSG) dataset. Results showed that data heteroscedasticity affected model performance. Fuzzy logic models were found to be very sensitive to outliers. We concluded that when regression analysis was used to design the model, the Sugeno fuzzy inference system with linear output outperformed the other models.

LGDec 16, 2018
Ensemble of Learning Project Productivity in Software Effort Based on Use Case Points

Mohammad Azzeh, Ali Bou Nassif, Shadi Banitaan et al.

It is well recognized that the project productivity is a key driver in estimating software project effort from Use Case Point size metric at early software development stages. Although, there are few proposed models for predicting productivity, there is no consistent conclusion regarding which model is the superior. Therefore, instead of building a new productivity prediction model, this paper presents a new ensemble construction mechanism applied for software project productivity prediction. Ensemble is an effective technique when performance of base models is poor. We proposed a weighted mean method to aggregate predicted productivities based on average of errors produced by training model. The obtained results show that the using ensemble is a good alternative approach when accuracies of base models are not consistently accurate over different datasets, and when models behave diversely.

SEDec 15, 2018
v-SVR Polynomial Kernel for Predicting the Defect Density in New Software Projects

Cuauhtemoc Lopez-Martin, Mohammad Azzeh, Ali Bou Nassif et al.

An important product measure to determine the effectiveness of software processes is the defect density (DD). In this study, we propose the application of support vector regression (SVR) to predict the DD of new software projects obtained from the International Software Benchmarking Standards Group (ISBSG) Release 2018 data set. Two types of SVR (e-SVR and v-SVR) were applied to train and test these projects. Each SVR used four types of kernels. The prediction accuracy of each SVR was compared to that of a statistical regression (i.e., a simple linear regression, SLR). Statistical significance test showed that v-SVR with polynomial kernel was better than that of SLR when new software projects were developed on mainframes and coded in programming languages of third generation

LGNov 26, 2018
Machine Learning Classifications of Coronary Artery Disease

Ali Bou Nassif, Omar Mahdi, Qassim Nasir et al.

Coronary Artery Disease (CAD) is one of the leading causes of death worldwide, and so it is very important to correctly diagnose patients with the disease. For medical diagnosis, machine learning is a useful tool, however features and algorithms must be carefully selected to get accurate classification. To this effect, three feature selection methods have been used on 13 input features from the Cleveland dataset with 297 entries, and 7 were selected. The selected features were used to train three different classifiers, which are SVM, Naïve Bayes and KNN using 10-fold cross-validation. The resulting models evaluated using Accuracy, Recall, Specificity and Precision. It is found that the Naïve Bayes classifier performs the best on this dataset and features, outperforming or matching SVM and KNN in all the four evaluation parameters used and achieving an accuracy of 84%.

SDOct 11, 2018
Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments

Ismail Shahin, Ali Bou Nassif, Shibani Hamsa

This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing, implementing and evaluating a new approach for speaker identification in emotional talking environments based on cascaded Gaussian Mixture Model-Deep Neural Network as a classifier. The results point out that the cascaded GMM-DNN classifier improves speaker identification performance at various emotions using two distinct speech databases: Emirati speech database (Arabic United Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS) English dataset. The proposed classifier outperforms classical classifiers such as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each dataset. Speaker identification performance that has been attained based on the cascaded GMM-DNN is similar to that acquired from subjective assessment by human listeners.

SDSep 3, 2018
Three-Stage Speaker Verification Architecture in Emotional Talking Environments

Ismail Shahin, Ali Bou Nassif

Speaker verification performance in neutral talking environment is usually high, while it is sharply decreased in emotional talking environments. This performance degradation in emotional environments is due to the problem of mismatch between training in neutral environment while testing in emotional environments. In this work, a three-stage speaker verification architecture has been proposed to enhance speaker verification performance in emotional environments. This architecture is comprised of three cascaded stages: gender identification stage followed by an emotion identification stage followed by a speaker verification stage. The proposed framework has been evaluated on two distinct and independent emotional speech datasets: in-house dataset and Emotional Prosody Speech and Transcripts dataset. Our results show that speaker verification based on both gender information and emotion information is superior to each of speaker verification based on gender information only, emotion information only, and neither gender information nor emotion information. The attained average speaker verification performance based on the proposed framework is very alike to that attained in subjective assessment by human listeners.

SDMar 31, 2018
Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments

Ismail Shahin, Ali Bou Nassif, Mohammed Bahutair

This work is devoted to capturing Emirati-accented speech database (Arabic United Arab Emirates database) in each of neutral and shouted talking environments in order to study and enhance text-independent Emirati-accented speaker identification performance in shouted environment based on each of First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s), Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as classifiers. In this research, our database was collected from fifty Emirati native speakers (twenty five per gender) uttering eight common Emirati sentences in each of neutral and shouted talking environments. The extracted features of our collected database are called Mel-Frequency Cepstral Coefficients (MFCCs). Our results show that average Emirati-accented speaker identification performance in neutral environment is 94.0%, 95.2%, and 95.9% based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the average performance in shouted environment is 51.3%, 55.5%, and 59.3% based, respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker identification performance in shouted environment based on CSPHMM3s is very similar to that obtained in subjective assessment by human listeners.

SEMay 29, 2017
A training process for improving the quality of software projects developed by a practitioner

Cuauhtémoc López-Martín, Ali Bou Nassif, Alain Abran

Background: The quality of a software product depends on the quality of the software process followed in developing the product. Therefore, many higher education institutions (HEI) and software organizations have implemented software process improvement (SPI) training courses to improve the software quality. Objective: Because the duration of a course is a concern for HEI and software organizations, we investigate whether the quality of software projects will be improved by reorganizing the activities of the ten assignments of the original personal software process (PSP) course into a modified PSP having fewer assignments (i.e., seven assignments). Method: The assignments were developed by following a modified PSP with fewer assignments but including the phases, forms, standards, and logs suggested in the original PSP. The measurement of the quality of the software assignments was based on defect density. Results: When the activities in the original PSP were reordered into fewer assignments, as practitioners progress through the PSP training, the defect density improved with statistical significance. Conclusions: Our modified PSP could be applied in academy and industrial environments which are concerned in the sense of reducing the PSP training time

SEMay 28, 2017
Analyzing the Relationship between Project Productivity and Environment Factors in the Use Case Points Method

Mohammad Azzeh, Ali Bou Nassif

Project productivity is a key factor for producing effort estimates from Use Case Points (UCP), especially when the historical dataset is absent. The first versions of UCP effort estimation models used a fixed number or very limited numbers of productivity ratios for all new projects. These approaches have not been well examined over a large number of projects so the validity of these studies was a matter for criticism. The newly available large software datasets allow us to perform further research on the usefulness of productivity for effort estimation of software development. Specifically, we studied the relationship between project productivity and UCP environmental factors, as they have a significant impact on the amount of productivity needed for a software project. Therefore, we designed four studies, using various classification and regression methods, to examine the usefulness of that relationship and its impact on UCP effort estimation. The results we obtained are encouraging and show potential improvement in effort estimation. Furthermore, the efficiency of that relationship is better over a dataset that comes from industry because of the quality of data collection. Our comment on the findings is that it is better to exclude environmental factors from calculating UCP and make them available only for computing productivity. The study also encourages project managers to understand how to better assess the environmental factors as they do have a significant impact on productivity

SEMar 11, 2017
Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics

Mohammad Azzeh, Ali Bou Nassif

Analogy-based effort estimation (ABE) is one of the efficient methods for software effort estimation because of its outstanding performance and capability of handling noisy datasets. Conventional ABE models usually use the same number of analogies for all projects in the datasets in order to make good estimates. The authors' claim is that using same number of analogies may produce overall best performance for the whole dataset but not necessarily best performance for each individual project. Therefore there is a need to better understand the dataset characteristics in order to discover the optimum set of analogies for each project rather than using a static k nearest projects. Method: We propose a new technique based on Bisecting k-medoids clustering algorithm to come up with the best set of analogies for each individual project before making the prediction. Results & Conclusions: With Bisecting k-medoids it is possible to better understand the dataset characteristic, and automatically find best set of analogies for each test project. Performance figures of the proposed estimation method are promising and better than those of other regular ABE models

SEMar 11, 2017
Fuzzy Model Tree For Early Effort Estimation

Mohammad Azzeh, Ali Bou Nassif

Use Case Points (UCP) is a well-known method to estimate the project size, based on Use Case diagram, at early phases of software development. Although the Use Case diagram is widely accepted as a de-facto model for analyzing object oriented software requirements over the world, UCP method did not take sufficient amount of attention because, as yet, there is no consensus on how to produce software effort from UCP. This paper aims to study the potential of using Fuzzy Model Tree to derive effort estimates based on UCP size measure using a dataset collected for that purpose. The proposed approach has been validated against Treeboost model, Multiple Linear Regression and classical effort estimation based on the UCP model. The obtained results are promising and show better performance than those obtained by classical UCP, Multiple Linear Regression and slightly better than those obtained by Tree boost model.

SEMar 11, 2017
An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation

Mohammad Azzeh, Ali Bou Nassif, Leandro L Minku

Objective: This paper investigates the potential of ensemble learning for variants of adjustment methods used in analogy-based effort estimation. The number k of analogies to be used is also investigated. Method We perform a large scale comparison study where many ensembles constructed from n out of 40 possible valid variants of adjustment methods are applied to eight datasets. The performance of each method was evaluated based on standardized accuracy and effect size. Results: The results have been subjected to statistical significance testing, and show reasonable significant improvements on the predictive performance where ensemble methods are applied. Conclusion: Our conclusions suggest that ensembles of adjustment methods can work well and achieve good performance, even though they are not always superior to single methods. We also recommend constructing ensembles from only linear adjustment methods, as they have shown better performance and were frequently ranked higher.

SEDec 4, 2016
Enhancing Use Case Points Estimation Method Using Soft Computing Techniques

Ali Bou Nassif, Luiz Fernando Capretz, Danny Ho

Software estimation is a crucial task in software engineering. Software estimation encompasses cost, effort, schedule, and size. The importance of software estimation becomes critical in the early stages of the software life cycle when the details of software have not been revealed yet. Several commercial and non-commercial tools exist to estimate software in the early stages. Most software effort estimation methods require software size as one of the important metric inputs and consequently, software size estimation in the early stages becomes essential. One of the approaches that has been used for about two decades in the early size and effort estimation is called use case points. Use case points method relies on the use case diagram to estimate the size and effort of software projects. Although the use case points method has been widely used, it has some limitations that might adversely affect the accuracy of estimation. This paper presents some techniques using fuzzy logic and neural networks to improve the accuracy of the use case points method. Results showed that an improvement up to 22% can be obtained using the proposed approach.

SENov 29, 2016
Neural Network Models for Software Development Effort Estimation: A Comparative Study

Ali Bou Nassif, Mohammad Azzeh, Luiz Fernando Capretz et al.

Software development effort estimation (SDEE) is one of the main tasks in software project management. It is crucial for a project manager to efficiently predict the effort or cost of a software project in a bidding process, since overestimation will lead to bidding loss and underestimation will cause the company to lose money. Several SDEE models exist; machine learning models, especially neural network models, are among the most prominent in the field. In this study, four different neural network models: Multilayer Perceptron, General Regression Neural Network, Radial Basis Function Neural Network, and Cascade Correlation Neural Network are compared with each other based on: (1) predictive accuracy centered on the Mean Absolute Error criterion, (2) whether such a model tends to overestimate or underestimate, and (3) how each model classifies the importance of its inputs. Industrial datasets from the International Software Benchmarking Standards Group (ISBSG) are used to train and validate the four models. The main ISBSG dataset was filtered and then divided into five datasets based on the productivity value of each project. Results show that the four models tend to overestimate in 80percent of the datasets, and the significance of the model inputs varies based on the selected model. Furthermore, the Cascade Correlation Neural Network outperforms the other three models in the majority of the datasets constructed on the Mean Absolute Residual criterion.

SENov 29, 2016
Pareto Efficient Multi Objective Optimization for Local Tuning of Analogy Based Estimation

Mohammad Azzeh, Ali Bou Nassif, Shadi Banitaan et al.

Analogy Based Effort Estimation (ABE) is one of the prominent methods for software effort estimation. The fundamental concept of ABE is closer to the mentality of expert estimation but with an automated procedure in which the final estimate is generated by reusing similar historical projects. The main key issue when using ABE is how to adapt the effort of the retrieved nearest neighbors. The adaptation process is an essential part of ABE to generate more successful accurate estimation based on tuning the selected raw solutions, using some adaptation strategy. In this study we show that there are three interrelated decision variables that have great impact on the success of adaptation method: (1) number of nearest analogies (k), (2) optimum feature set needed for adaptation, and (3) adaptation weights. To find the right decision regarding these variables, one need to study all possible combinations and evaluate them individually to select the one that can improve all prediction evaluation measures. The existing evaluation measures usually behave differently, presenting sometimes opposite trends in evaluating prediction methods. This means that changing one decision variable could improve one evaluation measure while it is decreasing the others. Therefore, the main theme of this research is how to come up with best decision variables that improve adaptation strategy and thus, the overall evaluation measures without degrading the others. The impact of these decisions together has not been investigated before, therefore we propose to view the building of adaptation procedure as a multi-objective optimization problem. The Particle Swarm Optimization Algorithm (PSO) is utilized to find the optimum solutions for such decision variables based on optimizing multiple evaluation measures

SEOct 8, 2016
A Hybrid Model for Estimating Software Project Effort from Use Case Points

Mohammad Azzeh, Ali Bou Nassif

Early software effort estimation is a hallmark of successful software project management. Building a reliable effort estimation model usually requires historical data. Unfortunately, since the information available at early stages of software development is scarce, it is recommended to use software size metrics as key cost factor of effort estimation. Use Case Points (UCP) is a prominent size measure designed mainly for object-oriented projects. Nevertheless, there are no established models that can translate UCP into its corresponding effort, therefore, most models use productivity as a second cost driver. The productivity in those models is usually guessed by experts and does not depend on historical data, which makes it subject to uncertainty. Thus, these models were not well examined using a large number of historical data. In this paper, we designed a hybrid model that consists of classification and prediction stages using a support vector machine and radial basis neural networks. The proposed model was constructed over a large number of observations collected from industrial and student projects. The proposed model was compared against previous UCP prediction models. The validation and empirical results demonstrated that the proposed model significantly surpasses these models on all datasets. The main conclusion is that the environmental factors of UCP can be used to classify and estimate productivity.

SEDec 1, 2015
A Hybrid Intelligent Model for Software Cost Estimation

Wei Lin Du, Luiz Fernando Capretz, Ali Bou Nassif et al.

Accurate software development effort estimation is critical to the success of software projects. Although many techniques and algorithmic models have been developed and implemented by practitioners, accurate software development effort prediction is still a challenging endeavor in the field of software engineering, especially in handling uncertain and imprecise inputs and collinear characteristics. In this paper, a hybrid in-telligent model combining a neural network model integrated with fuzzy model (neuro-fuzzy model) has been used to improve the accuracy of estimating software cost. The performance of the proposed model is assessed by designing and conducting evaluation with published project and industrial data. Results have shown that the proposed model demonstrates the ability of improving the estimation accuracy by 18% based on the Mean Magnitude of Relative Error (MMRE) criterion.

SEAug 28, 2015
A Comparison Between Decision Trees and Decision Tree Forest Models for Software Development Effort Estimation

Ali Bou Nassif, Mohammad Azzeh, Luiz Fernando Capretz et al.

Accurate software effort estimation has been a challenge for many software practitioners and project managers. Underestimation leads to disruption in the projects estimated cost and delivery. On the other hand, overestimation causes outbidding and financial losses in business. Many software estimation models exist; however, none have been proven to be the best in all situations. In this paper, a decision tree forest (DTF) model is compared to a traditional decision tree (DT) model, as well as a multiple linear regression model (MLR). The evaluation was conducted using ISBSG and Desharnais industrial datasets. Results show that the DTF model is competitive and can be used as an alternative in software effort prediction.

SEMay 6, 2014
Analyzing the Non-Functional Requirements in the Desharnais Dataset for Software Effort Estimation

Ali Bou Nassif, Luiz Fernando Capretz, Danny Ho

Studying the quality requirements (aka Non-Functional Requirements (NFR)) of a system is crucial in Requirements Engineering. Many software projects fail because of neglecting or failing to incorporate the NFR during the software life development cycle. This paper focuses on analyzing the importance of the quality requirements attributes in software effort estimation models based on the Desharnais dataset. The Desharnais dataset is a collection of eighty one software projects of twelve attributes developed by a Canadian software house. The analysis includes studying the influence of each of the quality requirements attributes, as well as the influence of all quality requirements attributes combined when calculating software effort using regression and Artificial Neural Network (ANN) models. The evaluation criteria used in this investigation include the Mean of the Magnitude of Relative Error (MMRE), the Prediction Level (PRED), Root Mean Squared Error (RMSE), Mean Error and the Coefficient of determination (R2). Results show that the quality attribute Language is the most statistically significant when calculating software effort. Moreover, if all quality requirements attributes are eliminated in the training stage and software effort is predicted based on software size only, the value of the error (MMRE) is doubled.