LGMay 24
On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive InsightIsmail B. Mustapha, Shafaatunnur Hasan, Sunday O. Olatunji et al.
Class imbalance in deep neural networks (DNNs) has witnessed a rapid increase in research attention in recent years. However, the varying accounts of the reasons behind the poor performance of DNN on imbalance data in pertinent literature shows that little is known about how this agelong phenomenon impacts the performance of DNNs. A better understanding of this problem is crucial to developing effective DNN-based imbalance methods. Thus, this study systematically investigates the impact of class imbalance on the learning dynamics of DNN by monitoring the learning pattern of DNN models on both the majority and minority classes of datasets of varying imbalance ratios. Experimental findings shows that as against learning from balanced datasets where DNN learns the classes similarly, class imbalance has severe deteriorating impact on the performance of DNN, driving the model to underfit the minority class samples in the early training epochs while simultaneously learning only the majority class. Although DNN ultimately learns the minority samples, learning in this manner only results in learnt minority representations that are non-generalizable at test phase because they are merely overfitted to keep the overall training loss as low as possible.
LGJan 8, 2025
Intelligent Gradient Boosting Algorithms for Estimating Strength of Modified Subgrade SoilIsmail B. Mustapha, Muyideen Abdulkareem, Shafaatunnur Hasan et al.
The performance of pavement under loading depends on the strength of the subgrade. However, experimental estimation of properties of pavement strengths such as California bearing ratio (CBR), unconfined compressive strength (UCS) and resistance value (R) are often tedious, time-consuming and costly, thereby inspiring a growing interest in machine learning based tools which are simple, cheap and fast alternatives. Thus, the potential application of two boosting techniques; categorical boosting (CatBoost) and extreme gradient boosting (XGBoost) and support vector regression (SVR), is similarly explored in this study for estimation of properties of subgrade soil modified with hydrated lime activated rice husk ash (HARSH). Using 121 experimental data samples of varying proportions of HARSH, plastic limit, liquid limit, plasticity index, clay activity, optimum moisture content, and maximum dry density as input for CBR, UCS and R estimation, four evaluation metrics namely coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used to evaluate the models' performance. The results indicate that XGBoost outperformed CatBoost and SVR in estimating these properties, yielding R2 of 0.9994, 0.9995 and 0.9999 in estimating the CBR, UCS and R respectively. Also, SVR outperformed CatBoost in estimating the CBR and R with R2 of 0.9997 respectively. On the other hand, CatBoost outperformed SVR in estimating the UCS with R2 of 0.9994. Feature sensitivity analysis shows that the three machine learning techniques are unanimous that increasing HARSH proportion lead to values of the estimated properties respectively. A comparison with previous results also shows superiority of XGBoost in estimating subgrade properties.
CRDec 27, 2020
Effective Email Spam Detection System using Extreme Gradient BoostingIsmail B. Mustapha, Shafaatunnur Hasan, Sunday O. Olatunji et al.
The popularity, cost-effectiveness and ease of information exchange that electronic mails offer to electronic device users has been plagued with the rising number of unsolicited or spam emails. Driven by the need to protect email users from this growing menace, research in spam email filtering/detection systems has being increasingly active in the last decade. However, the adaptive nature of spam emails has often rendered most of these systems ineffective. While several spam detection models have been reported in literature, the reported performance on an out of sample test data shows the room for more improvement. Presented in this research is an improved spam detection model based on Extreme Gradient Boosting (XGBoost) which to the best of our knowledge has received little attention spam email detection problems. Experimental results show that the proposed model outperforms earlier approaches across a wide range of evaluation metrics. A thorough analysis of the model results in comparison to the results of earlier works is also presented.