Iqbal H. Sarker

LG
h-index43
31papers
1,523citations
Novelty25%
AI Score33

31 Papers

CRSep 28, 2023
AI Potentiality and Awareness: A Position Paper from the Perspective of Human-AI Teaming in Cybersecurity

Iqbal H. Sarker, Helge Janicke, Nazeeruddin Mohammad et al.

This position paper explores the broad landscape of AI potentiality in the context of cybersecurity, with a particular emphasis on its possible risk factors with awareness, which can be managed by incorporating human experts in the loop, i.e., "Human-AI" teaming. As artificial intelligence (AI) technologies advance, they will provide unparalleled opportunities for attack identification, incident response, and recovery. However, the successful deployment of AI into cybersecurity measures necessitates an in-depth understanding of its capabilities, challenges, and ethical and legal implications to handle associated risk factors in real-world application areas. Towards this, we emphasize the importance of a balanced approach that incorporates AI's computational power with human expertise. AI systems may proactively discover vulnerabilities and detect anomalies through pattern recognition, and predictive modeling, significantly enhancing speed and accuracy. Human experts can explain AI-generated decisions to stakeholders, regulators, and end-users in critical situations, ensuring responsibility and accountability, which helps establish trust in AI-driven security solutions. Therefore, in this position paper, we argue that human-AI teaming is worthwhile in cybersecurity, in which human expertise such as intuition, critical thinking, or contextual understanding is combined with AI's computational power to improve overall cyber defenses.

CLApr 17, 2021Code
Emotion Classification in a Resource Constrained Language Using Transformer-based Approach

Avishek Das, Omar Sharif, Mohammed Moshiul Hoque et al.

Although research on emotion classification has significantly progressed in high-resource languages, it is still infancy for resource-constrained languages like Bengali. However, unavailability of necessary language processing tools and deficiency of benchmark corpora makes the emotion classification task in Bengali more challenging and complicated. This work proposes a transformer-based technique to classify the Bengali text into one of the six basic emotions: anger, fear, disgust, sadness, joy, and surprise. A Bengali emotion corpus consists of 6243 texts is developed for the classification task. Experimentation carried out using various machine learning (LR, RF, MNB, SVM), deep neural networks (CNN, BiLSTM, CNN+BiLSTM) and transformer (Bangla-BERT, m-BERT, XLM-R) based approaches. Experimental outcomes indicate that XLM-R outdoes all other techniques by achieving the highest weighted $f_1$-score of $69.73\%$ on the test data. The dataset is publicly available at https://github.com/omar-sharif03/NAACL-SRW-2021.

LGJan 7, 2024
Detecting Anomalies in Blockchain Transactions using Machine Learning Classifiers and Explainability Analysis

Mohammad Hasan, Mohammad Shahriar Rahman, Helge Janicke et al.

As the use of Blockchain for digital payments continues to rise in popularity, it also becomes susceptible to various malicious attacks. Successfully detecting anomalies within Blockchain transactions is essential for bolstering trust in digital payments. However, the task of anomaly detection in Blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions. Although several studies have been conducted in the field, a limitation persists: the lack of explanations for the model's predictions. This study seeks to overcome this limitation by integrating eXplainable Artificial Intelligence (XAI) techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions. The Shapley Additive exPlanation (SHAP) method is employed to measure the contribution of each feature, and it is compatible with ensemble models. Moreover, we present rules for interpreting whether a Bitcoin transaction is anomalous or not. Additionally, we have introduced an under-sampling algorithm named XGBCLUS, designed to balance anomalous and non-anomalous transaction data. This algorithm is compared against other commonly used under-sampling and over-sampling techniques. Finally, the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers. Our experimental results demonstrate that: (i) XGBCLUS enhances TPR and ROC-AUC scores compared to state-of-the-art under-sampling and over-sampling techniques, and (ii) our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy, TPR, and FPR scores.

LGFeb 21, 2024
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach

Mohammad Amaz Uddin, Md Mahiuddin, Iqbal H. Sarker

Phishing email is a serious cyber threat that tries to deceive users by sending false emails with the intention of stealing confidential information or causing financial harm. Attackers, often posing as trustworthy entities, exploit technological advancements and sophistication to make detection and prevention of phishing more challenging. Despite extensive academic research, phishing detection remains an ongoing and formidable challenge in the cybersecurity landscape. Large Language Models (LLMs) and Masked Language Models (MLMs) possess immense potential to offer innovative solutions to address long-standing challenges. In this research paper, we present an optimized, fine-tuned transformer-based DistilBERT model designed for the detection of phishing emails. In the detection process, we work with a phishing email dataset and utilize the preprocessing techniques to clean and solve the imbalance class issues. Through our experiments, we found that our model effectively achieves high accuracy, demonstrating its capability to perform well. Finally, we demonstrate our fine-tuned model using Explainable-AI (XAI) techniques such as Local Interpretable Model-Agnostic Explanations (LIME) and Transformer Interpret to explain how our model makes predictions in the context of text classification for phishing emails.

LGMay 12, 2024
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis

Mohammad Amaz Uddin, Muhammad Nazrul Islam, Leandros Maglaras et al.

SMS, or short messaging service, is a widely used and cost-effective communication medium that has sadly turned into a haven for unwanted messages, commonly known as SMS spam. With the rapid adoption of smartphones and Internet connectivity, SMS spam has emerged as a prevalent threat. Spammers have taken notice of the significance of SMS for mobile phone users. Consequently, with the emergence of new cybersecurity threats, the number of SMS spam has expanded significantly in recent years. The unstructured format of SMS data creates significant challenges for SMS spam detection, making it more difficult to successfully fight spam attacks in the cybersecurity domain. In this work, we employ optimized and fine-tuned transformer-based Large Language Models (LLMs) to solve the problem of spam message detection. We use a benchmark SMS spam dataset for this spam detection and utilize several preprocessing techniques to get clean and noise-free data and solve the class imbalance problem using the text augmentation technique. The overall experiment showed that our optimized fine-tuned BERT (Bidirectional Encoder Representations from Transformers) variant model RoBERTa obtained high accuracy with 99.84\%. We also work with Explainable Artificial Intelligence (XAI) techniques to calculate the positive and negative coefficient scores which explore and explain the fine-tuned model transparency in this text-based spam SMS detection task. In addition, traditional Machine Learning (ML) models were also examined to compare their performance with the transformer-based models. This analysis describes how LLMs can make a good impact on complex textual-based spam data in the cybersecurity field.

AIMay 29, 2025
A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy

Ahmad Mohsin, Helge Janicke, Ahmed Ibrahim et al.

This article presents a structured framework for Human-AI collaboration in Security Operations Centers (SOCs), integrating AI autonomy, trust calibration, and Human-in-the-loop decision making. Existing frameworks in SOCs often focus narrowly on automation, lacking systematic structures to manage human oversight, trust calibration, and scalable autonomy with AI. Many assume static or binary autonomy settings, failing to account for the varied complexity, criticality, and risk across SOC tasks considering Humans and AI collaboration. To address these limitations, we propose a novel autonomy tiered framework grounded in five levels of AI autonomy from manual to fully autonomous, mapped to Human-in-the-Loop (HITL) roles and task-specific trust thresholds. This enables adaptive and explainable AI integration across core SOC functions, including monitoring, protection, threat detection, alert triage, and incident response. The proposed framework differentiates itself from previous research by creating formal connections between autonomy, trust, and HITL across various SOC levels, which allows for adaptive task distribution according to operational complexity and associated risks. The framework is exemplified through a simulated cyber range that features the cybersecurity AI-Avatar, a fine-tuned LLM-based SOC assistant. The AI-Avatar case study illustrates human-AI collaboration for SOC tasks, reducing alert fatigue, enhancing response coordination, and strategically calibrating trust. This research systematically presents both the theoretical and practical aspects and feasibility of designing next-generation cognitive SOCs that leverage AI not to replace but to enhance human decision-making.

CRMar 28, 2024
A Data-Driven Predictive Analysis on Cyber Security Threats with Key Risk Factors

Fatama Tuz Johora, Md Shahedul Islam Khan, Esrath Kanon et al.

Cyber risk refers to the risk of defacing reputation, monetary losses, or disruption of an organization or individuals, and this situation usually occurs by the unconscious use of cyber systems. The cyber risk is unhurriedly increasing day by day and it is right now a global threat. Developing countries like Bangladesh face major cyber risk challenges. The growing cyber threat worldwide focuses on the need for effective modeling to predict and manage the associated risk. This paper exhibits a Machine Learning(ML) based model for predicting individuals who may be victims of cyber attacks by analyzing socioeconomic factors. We collected the dataset from victims and non-victims of cyberattacks based on socio-demographic features. The study involved the development of a questionnaire to gather data, which was then used to measure the significance of features. Through data augmentation, the dataset was expanded to encompass 3286 entries, setting the stage for our investigation and modeling. Among several ML models with 19, 20, 21, and 26 features, we proposed a novel Pertinent Features Random Forest (RF) model, which achieved maximum accuracy with 20 features (95.95\%) and also demonstrated the association among the selected features using the Apriori algorithm with Confidence (above 80\%) according to the victim. We generated 10 important association rules and presented the framework that is rigorously evaluated on real-world datasets, demonstrating its potential to predict cyberattacks and associated risk factors effectively. Looking ahead, future efforts will be directed toward refining the predictive model's precision and delving into additional risk factors, to fortify the proposed framework's efficacy in navigating the complex terrain of cybersecurity threats.

LGSep 12, 2025
SME-TEAM: Leveraging Trust and Ethics for Secure and Responsible Use of AI and LLMs in SMEs

Iqbal H. Sarker, Helge Janicke, Ahmad Mohsin et al.

Artificial Intelligence (AI) and Large Language Models (LLMs) are revolutionizing today's business practices; however, their adoption within small and medium-sized enterprises (SMEs) raises serious trust, ethical, and technical issues. In this perspective paper, we introduce a structured, multi-phased framework, "SME-TEAM" for the secure and responsible use of these technologies in SMEs. Based on a conceptual structure of four key pillars, i.e., Data, Algorithms, Human Oversight, and Model Architecture, SME-TEAM bridges theoretical ethical principles with operational practice, enhancing AI capabilities across a wide range of applications in SMEs. Ultimately, this paper provides a structured roadmap for the adoption of these emerging technologies, positioning trust and ethics as a driving force for resilience, competitiveness, and sustainable innovation within the area of business analytics and SMEs.

LGMar 4, 2025
AI Enabled User-Specific Cyberbullying Severity Detection with Explainability

Tabia Tanzin Prama, Jannatul Ferdaws Amrin, Md. Mushfique Anwar et al.

The rise of social media has significantly increased the prevalence of cyberbullying (CB), posing serious risks to both mental and physical well-being. Effective detection systems are essential for mitigating its impact. While several machine learning (ML) models have been developed, few incorporate victims' psychological, demographic, and behavioral factors alongside bullying comments to assess severity. In this study, we propose an AI model intregrating user-specific attributes, including psychological factors (self-esteem, anxiety, depression), online behavior (internet usage, disciplinary history), and demographic attributes (race, gender, ethnicity), along with social media comments. Additionally, we introduce a re-labeling technique that categorizes social media comments into three severity levels: Not Bullying, Mild Bullying, and Severe Bullying, considering user-specific factors.Our LSTM model is trained using 146 features, incorporating emotional, topical, and word2vec representations of social media comments as well as user-level attributes and it outperforms existing baseline models, achieving the highest accuracy of 98\% and an F1-score of 0.97. To identify key factors influencing the severity of cyberbullying, we employ explainable AI techniques (SHAP and LIME) to interpret the model's decision-making process. Our findings reveal that, beyond hate comments, victims belonging to specific racial and gender groups are more frequently targeted and exhibit higher incidences of depression, disciplinary issues, and low self-esteem. Additionally, individuals with a prior history of bullying are at a greater risk of becoming victims of cyberbullying.

LGJan 21, 2024
Agricultural Recommendation System based on Deep Learning: A Multivariate Weather Forecasting Approach

Md Zubair, Md. Shahidul Salim, Mehrab Mustafy Rahman et al.

Agriculture plays a fundamental role in driving economic growth and ensuring food security for populations around the world. Although labor-intensive agriculture has led to steady increases in food grain production in many developing countries, it is frequently challenged by adverse weather conditions, including heavy rainfall, low temperatures, and drought. These factors substantially hinder food production, posing significant risks to global food security. In order to have a profitable, sustainable, and farmer-friendly agricultural practice, this paper proposes a context-based crop recommendation system powered by a weather forecast model. For implementation purposes, we have considered the whole territory of Bangladesh. With extensive evaluation, the multivariate Stacked Bi-LSTM (three Bi-LSTM layers with a time Distributed layer) Network is employed as the weather forecasting model. The proposed weather model can forecast Rainfall, Temperature, Humidity, and Sunshine for any given location in Bangladesh with an average R-Squared value of 0.9824, and the model outperforms other state-of-the-art LSTM models. These predictions guide our system in generating viable farming decisions. Additionally, our full-fledged system is capable of alerting the farmers about extreme weather conditions so that preventive measures can be undertaken to protect the crops. Finally, the system is also adept at making knowledge-based crop suggestions for flood and drought-prone regions.

IRAug 13, 2021
A Dynamic Topic Identification and Labeling Approach of COVID-19 Tweets

Khandaker Tayef Shahriar, Iqbal H. Sarker, Muhammad Nazrul Islam et al.

This paper formulates the problem of dynamically identifying key topics with proper labels from COVID-19 Tweets to provide an overview of wider public opinion. Nowadays, social media is one of the best ways to connect people through Internet technology, which is also considered an essential part of our daily lives. In late December 2019, an outbreak of the novel coronavirus, COVID-19 was reported, and the World Health Organization declared an emergency due to its rapid spread all over the world. The COVID-19 epidemic has affected the use of social media by many people across the globe. Twitter is one of the most influential social media services, which has seen a dramatic increase in its use from the epidemic. Thus dynamic extraction of specific topics with labels from tweets of COVID-19 is a challenging issue for highlighting conversation instead of manual topic labeling approach. In this paper, we propose a framework that automatically identifies the key topics with labels from the tweets using the top Unigram feature of aspect terms cluster from Latent Dirichlet Allocation (LDA) generated topics. Our experiment result shows that this dynamic topic identification and labeling approach is effective having the accuracy of 85.48\% with respect to the manual static approach.

CRMar 28, 2021
CyberLearning: Effectiveness Analysis of Machine Learning Security Modeling to Detect Cyber-Anomalies and Multi-Attacks

Iqbal H. Sarker

Detecting cyber-anomalies and attacks are becoming a rising concern these days in the domain of cybersecurity. The knowledge of artificial intelligence, particularly, the machine learning techniques can be used to tackle these issues. However, the effectiveness of a learning-based security model may vary depending on the security features and the data characteristics. In this paper, we present "CyberLearning", a machine learning-based cybersecurity modeling with correlated-feature selection, and a comprehensive empirical analysis on the effectiveness of various machine learning based security models. In our CyberLearning modeling, we take into account a binary classification model for detecting anomalies, and multi-class classification model for various types of cyber-attacks. To build the security model, we first employ the popular ten machine learning classification techniques, such as naive Bayes, Logistic regression, Stochastic gradient descent, K-nearest neighbors, Support vector machine, Decision Tree, Random Forest, Adaptive Boosting, eXtreme Gradient Boosting, as well as Linear discriminant analysis. We then present the artificial neural network-based security model considering multiple hidden layers. The effectiveness of these learning-based security models is examined by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 and NSL-KDD. Overall, this paper aims to serve as a reference point for data-driven security modeling through our experimental analysis and findings in the context of cybersecurity.

LGDec 21, 2020
An Efficient K-means Clustering Algorithm for Analysing COVID-19

Md. Zubair, MD. Asif Iqbal, Avijeet Shil et al.

COVID-19 hits the world like a storm by arising pandemic situations for most of the countries around the world. The whole world is trying to overcome this pandemic situation. A better health care quality may help a country to tackle the pandemic. Making clusters of countries with similar types of health care quality provides an insight into the quality of health care in different countries. In the area of machine learning and data science, the K-means clustering algorithm is typically used to create clusters based on similarity. In this paper, we propose an efficient K-means clustering method that determines the initial centroids of the clusters efficiently. Based on this proposed method, we have determined health care quality clusters of countries utilizing the COVID-19 datasets. Experimental results show that our proposed method reduces the number of iterations and execution time to analyze COVID-19 while comparing with the traditional k-means clustering algorithm.

LGDec 9, 2020
Predicting Individual Substance Abuse Vulnerability using Machine Learning Techniques

Uwaise Ibna Islam, Iqbal H. Sarker, Enamul Haque et al.

Substance abuse is the unrestrained and detrimental use of psychoactive chemical substances, unauthorized drugs, and alcohol. Continuous use of these substances can ultimately lead a human to disastrous consequences. As patients display a high rate of relapse, prevention at an early stage can be an effective restraint. We therefore propose a binary classifier to identify any individual's present vulnerability towards substance abuse by analyzing subjects' socio-economic environment. We have collected data by a questionnaire which is created after carefully assessing the commonly involved factors behind substance abuse. Pearson's chi-squared test of independence is used to identify key feature variables influencing substance abuse. Later we build the predictive classifiers using machine learning classification algorithms on those variables. Logistic regression classifier trained with 18 features can predict individual vulnerability with the best accuracy.

LGDec 9, 2020
An Isolation Forest Learning Based Outlier Detection Approach for Effectively Classifying Cyber Anomalies

Rony Chowdhury Ripan, Iqbal H. Sarker, Md Musfique Anwar et al.

Cybersecurity has recently gained considerable interest in today's security issues because of the popularity of the Internet-of-Things (IoT), the considerable growth of mobile networks, and many related apps. Therefore, detecting numerous cyber-attacks in a network and creating an effective intrusion detection system plays a vital role in today's security. In this paper, we present an Isolation Forest Learning-Based Outlier Detection Model for effectively classifying cyber anomalies. In order to evaluate the efficacy of the resulting Outlier Detection model, we also use several conventional machine learning approaches, such as Logistic Regression (LR), Support Vector Machine (SVM), AdaBoost Classifier (ABC), Naive Bayes (NB), and K-Nearest Neighbor (KNN). The effectiveness of our proposed Outlier Detection model is evaluated by conducting experiments on Network Intrusion Dataset with evaluation metrics such as precision, recall, F1-score, and accuracy. Experimental results show that the classification accuracy of cyber anomalies has been improved after removing outliers.

CLNov 19, 2020
SentiLSTM: A Deep Learning Approach for Sentiment Analysis of Restaurant Reviews

Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque et al.

The amount of textual data generation has increased enormously due to the effortless access of the Internet and the evolution of various web 2.0 applications. These textual data productions resulted because of the people express their opinion, emotion or sentiment about any product or service in the form of tweets, Facebook post or status, blog write up, and reviews. Sentiment analysis deals with the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude toward a particular topic is positive, negative, or neutral. The impact of customer review is significant to perceive the customer attitude towards a restaurant. Thus, the automatic detection of sentiment from reviews is advantageous for the restaurant owners, or service providers and customers to make their decisions or services more satisfactory. This paper proposes, a deep learning-based technique (i.e., BiLSTM) to classify the reviews provided by the clients of the restaurant into positive and negative polarities. A corpus consists of 8435 reviews is constructed to evaluate the proposed technique. In addition, a comparative analysis of the proposed technique with other machine learning algorithms presented. The results of the evaluation on test dataset show that BiLSTM technique produced in the highest accuracy of 91.35%.

LGAug 3, 2020
A Survey on the Use of AI and ML for Fighting the COVID-19 Pandemic

Muhammad Nazrul Islam, Toki Tahmid Inan, Suzzana Rafi et al.

Artificial intelligence (AI) and machine learning (ML) have made a paradigm shift in health care which, eventually can be used for decision support and forecasting by exploring the medical data. Recent studies showed that AI and ML can be used to fight against the COVID-19 pandemic. Therefore, the objective of this review study is to summarize the recent AI and ML based studies that have focused to fight against COVID-19 pandemic. From an initial set of 634 articles, a total of 35 articles were finally selected through an extensive inclusion-exclusion process. In our review, we have explored the objectives/aims of the existing studies (i.e., the role of AI/ML in fighting COVID-19 pandemic); context of the study (i.e., study focused to a specific country-context or with a global perspective); type and volume of dataset; methodology, algorithms or techniques adopted in the prediction or diagnosis processes; and mapping the algorithms/techniques with the data type highlighting their prediction/classification accuracy. We particularly focused on the uses of AI/ML in analyzing the pandemic data in order to depict the most recent progress of AI for fighting against COVID-19 and pointed out the potential scope of further research.

LGMar 11, 2020
Crime Prediction Using Spatio-Temporal Data

Sohrab Hossain, Ahmed Abtahee, Imran Kashem et al.

A crime is a punishable offence that is harmful for an individual and his society. It is obvious to comprehend the patterns of criminal activity to prevent them. Research can help society to prevent and solve crime activates. Study shows that only 10 percent offenders commits 50 percent of the total offences. The enforcement team can respond faster if they have early information and pre-knowledge about crime activities of the different points of a city. In this paper, supervised learning technique is used to predict crimes with better accuracy. The proposed system predicts crimes by analyzing data-set that contains records of previously committed crimes and their patterns. The system stands on two main algorithms - i) decision tree, and ii) k-nearest neighbor. Random Forest algorithm and Adaboost are used to increase the accuracy of the prediction. Finally, oversampling is used for better accuracy. The proposed system is feed with a criminal-activity data set of twelve years of San Francisco city.

LGDec 17, 2019
BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model

Iqbal H. Sarker, Alan Colman, Jun Han et al.

This paper formulates the problem of building a context-aware predictive model based on user diverse behavioral activities with smartphones. In the area of machine learning and data science, a tree-like model as that of decision tree is considered as one of the most popular classification techniques, which can be used to build a data-driven predictive model. The traditional decision tree model typically creates a number of leaf nodes as decision nodes that represent context-specific rigid decisions, and consequently may cause overfitting problem in behavior modeling. However, in many practical scenarios within the context-aware environment, the generalized outcomes could play an important role to effectively capture user behavior. In this paper, we propose a behavioral decision tree, "BehavDT" context-aware model that takes into account user behavior-oriented generalization according to individual preference level. The BehavDT model outputs not only the generalized decisions but also the context-specific decisions in relevant exceptional cases. The effectiveness of our BehavDT model is studied by conducting experiments on individual user real smartphone datasets. Our experimental results show that the proposed BehavDT context-aware model is more effective when compared with the traditional machine learning approaches, in predicting user diverse behaviors considering multi-dimensional contexts.

CYSep 2, 2019
CalBehav: A Machine Learning based Personalized Calendar Behavioral Model using Time-Series Smartphone Data

Iqbal H. Sarker, Alan Colman, Jun Han et al.

The electronic calendar is a valuable resource nowadays for managing our daily life appointments or schedules, also known as events, ranging from professional to highly personal. Researchers have studied various types of calendar events to predict smartphone user behavior for incoming mobile communications. However, these studies typically do not take into account behavioral variations between individuals. In the real world, smartphone users can differ widely from each other in how they respond to incoming communications during their scheduled events. Moreover, an individual user may respond the incoming communications differently in different contexts subject to what type of event is scheduled in her personal calendar. Thus, a static calendar-based behavioral model for individual smartphone users does not necessarily reflect their behavior to the incoming communications. In this paper, we present a machine learning based context-aware model that is personalized and dynamically identifies individual's dominant behavior for their scheduled events using logged time-series smartphone data, and shortly name as ``CalBehav''. The experimental results based on real datasets from calendar and phone logs, show that this data-driven personalized model is more effective for intelligently managing the incoming mobile communications compared to existing calendar-based approaches.

CYAug 26, 2019
AppsPred: Predicting Context-Aware Smartphone Apps using Random Forest Learning

Iqbal H. Sarker, Khaled Salah

Due to the popularity of context-awareness in the Internet of Things (IoT) and the recent advanced features in the most popular IoT device, i.e., smartphone, modeling and predicting personalized usage behavior based on relevant contexts can be highly useful in assisting them to carry out daily routines and activities. Usage patterns of different categories smartphone apps such as social networking, communication, entertainment, or daily life services related apps usually vary greatly between individuals. People use these apps differently in different contexts, such as temporal context, spatial context, individual mood and preference, work status, Internet connectivity like Wifi? status, or device related status like phone profile, battery level etc. Thus, we consider individuals' apps usage as a multi-class context-aware problem for personalized modeling and prediction. Random Forest learning is one of the most popular machine learning techniques to build a multi-class prediction model. Therefore, in this paper, we present an effective context-aware smartphone apps prediction model, and name it "AppsPred" using random forest machine learning technique that takes into account optimal number of trees based on such multi-dimensional contexts to build the resultant forest. The effectiveness of this model is examined by conducting experiments on smartphone apps usage datasets collected from individual users. The experimental results show that our AppsPred significantly outperforms other popular machine learning classification approaches like ZeroR, Naive Bayes, Decision Tree, Support Vector Machines, Logistic Regression while predicting smartphone apps in various context-aware test cases.

CYAug 25, 2019
E-MIIM: An Ensemble Learning based Context-Aware Mobile Telephony Model for Intelligent Interruption Management

Iqbal H. Sarker, A. S. M. Kayes, Md Hasan Furhad et al.

Nowadays, mobile telephony interruptions in our daily life activities are common because of the inappropriate ringing notifications of incoming phone calls in different contexts. Such interruptions may impact on the work attention not only for the mobile phone owners but also the surrounding people. Decision tree is the most popular machine learning classification technique that is used in existing context-aware mobile intelligent interruption management (MIIM) model to overcome such issues. However, a single decision tree based context-aware model may cause overfitting problem and thus decrease the prediction accuracy of the inferred model. Therefore, in this paper, we propose an ensemble machine learning based context-aware mobile telephony model for the purpose of intelligent interruption management by taking into account multi-dimensional contexts and name it "E-MIIM". The experimental results on individuals' real life mobile telephony datasets show that our E-MIIM model is more effective and outperforms existing MIIM model for predicting and managing individual's mobile telephony interruptions based on their relevant contextual information.

SIFeb 11, 2019
A Machine Learning based Robust Prediction Model for Real-life Mobile Phone Data

Iqbal H. Sarker

Real-life mobile phone data may contain noisy instances, which is a fundamental issue for building a prediction model with many potential negative consequences. The complexity of the inferred model may increase, may arise overfitting problem, and thereby the overall prediction accuracy of the model may decrease. In this paper, we address these issues and present a robust prediction model for real-life mobile phone data of individual users, in order to improve the prediction accuracy of the model. In our robust model, we first effectively identify and eliminate the noisy instances from the training dataset by determining a dynamic noise threshold using naive Bayes classifier and laplace estimator, which may differ from user-to-user according to their unique behavioral patterns. After that, we employ the most popular rule-based machine learning classification technique, i.e., decision tree, on the noise-free quality dataset to build the prediction model. Experimental results on the real-life mobile phone datasets (e.g., phone call log) of individual mobile phone users, show the effectiveness of our robust model in terms of precision, recall and f-measure.

LGJan 10, 2019
Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus

Md. Faisal Faruque, Asaduzzaman, Iqbal H. Sarker

Diabetes mellitus is a common disease of human body caused by a group of metabolic disorders where the sugar levels over a prolonged period is very high. It affects different organs of the human body which thus harm a large number of the body's system, in particular the blood veins and nerves. Early prediction in such disease can be controlled and save human life. To achieve the goal, this research work mainly explores various risk factors related to this disease using machine learning techniques. Machine learning techniques provide efficient result to extract knowledge by constructing predicting models from diagnostic medical datasets collected from the diabetic patients. Extracting knowledge from such data can be useful to predict diabetic patients. In this work, we employ four popular machine learning algorithms, namely Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN) and C4.5 Decision Tree, on adult population data to predict diabetic mellitus. Our experimental results show that C4.5 decision tree achieved higher accuracy compared to other machine learning techniques.

CYNov 15, 2018
Individualized Time-Series Segmentation for Mining Mobile Phone User Behavior

Iqbal H. Sarker, Alan Colman, MA Kabir et al.

Mobile phones can record individual's daily behavioral data as a time-series. In this paper, we present an effective time-series segmentation technique that extracts optimal time segments of individual's similar behavioral characteristics utilizing their mobile phone data. One of the determinants of an individual's behavior is the various activities undertaken at various times-of-the-day and days-of-the-week. In many cases, such behavior will follow temporal patterns. Currently, researchers use either equal or unequal interval-based segmentation of time for mining mobile phone users' behavior. Most of them take into account static temporal coverage of 24-h-a-day and few of them take into account the number of incidences in time-series data. However, such segmentations do not necessarily map to the patterns of individual user activity and subsequent behavior because of not taking into account the diverse behaviors of individuals over time-of-the-week. Therefore, we propose a behavior-oriented time segmentation (BOTS) technique that takes into account not only the temporal coverage of the week but also the number of incidences of diverse behaviors dynamically for producing similar behavioral time segments over the week utilizing time-series data. Experiments on the real mobile phone datasets show that our proposed segmentation technique better captures the user's dominant behavior at various times-of-the-day and days-of-the-week enabling the generation of high confidence temporal rules in order to mine individual mobile phone users' behavior.

LGOct 30, 2018
Research Issues in Mining User Behavioral Rules for Context-Aware Intelligent Mobile Applications

Iqbal H. Sarker

Context-awareness in smart mobile applications is a growing area of study, because of it's intelligence in the applications. In order to build context-aware intelligent applications, mining contextual behavioral rules of individual smartphone users utilizing their phone log data is the key. However, to mine these rules, a number of issues, such as the quality of smartphone data, understanding the relevancy of contexts, discretization of continuous contextual data, discovery of useful behavioral rules of individuals and their ordering, knowledge-based interactive post-mining for semantic understanding, and dynamic updating and management of rules according to their present behavior, are investigated. In this paper, we briefly discuss these issues and their potential solution directions for mining individuals' behavioral rules, for the purpose of building various context-aware intelligent mobile applications. We also summarize a number of real-life rule-based applications that intelligently assist individual smartphone users according to their behavioral rules in their daily activities.

CYOct 15, 2018
Mobile Data Science: Towards Understanding Data-Driven Intelligent Mobile Applications

Iqbal H. Sarker

Due to the popularity of smart mobile phones and context-aware technology, various contextual data relevant to users' diverse activities with mobile phones is available around us. This enables the study on mobile phone data and context-awareness in computing, for the purpose of building data-driven intelligent mobile applications, not only on a single device but also in a distributed environment for the benefit of end users. Based on the availability of mobile phone data, and the usefulness of data-driven applications, in this paper, we discuss about mobile data science that involves in collecting the mobile phone data from various sources and building data-driven models using machine learning techniques, in order to make dynamic decisions intelligently in various day-to-day situations of the users. For this, we first discuss the fundamental concepts and the potentiality of mobile data science to build intelligent applications. We also highlight the key elements and explain various key modules involving in the process of mobile data science. This article is the first in the field to draw a big picture, and thinking about mobile data science, and it's potentiality in developing various data-driven intelligent mobile applications. We believe this study will help both the researchers and application developers for building smart data-driven mobile applications, to assist the end mobile phone users in their daily activities.

CYOct 15, 2018
SilentPhone: Inferring User Unavailability based Opportune Moments to Minimize Call Interruptions

Iqbal H. Sarker

The increasing popularity of cell phones has made them the most personal and ubiquitous communication devices nowadays. Typically, the ringing notifications of mobile phones are used to inform the users about the incoming calls. However, the notifications of inappropriate incoming calls sometimes cause interruptions not only for the users but also the surrounding people. In this paper, we present a data-driven approach to infer the opportune moments for such phone call interruptions based on user's unavailability, i.e., when a user is unable to answer the incoming phone calls, by analyzing individual's past phone log data, and to discover the corresponding phone silent mode configuring rules for the purpose of minimizing call interruptions in an automated intelligent system. Experiments on the real mobile phone datasets show that our approach is able to identify the opportune moments for call interruptions and generates corresponding silent mode configuring rules by capturing the dominant behavior of individual users' at various times-of-the-day and days-of-the week.

CYOct 15, 2018
Understanding the Role of Data-Centric Social Context in Personalized Mobile Applications

Iqbal H. Sarker

Context-awareness in personalized mobile applications is a growing area of study. Social context is one of the most important sources of information in human-activity based applications. In this paper, we mainly focus on social relational context that represents the interpersonal relationship between individuals, and the role or influence of such context on users' diverse phone call activities in their real world life. Individuals different phone call activities such as making a phone call to a particular person or responding an incoming call may differ from person-to-person based on their interpersonal relationships such as family, friend, or colleague. However, it is very difficult to make the device understandable about such semantic relationships between individuals and the relevant context-aware applications. To address this issue, in this paper, we explore the data-centric social relational context that can play a significant role in building context-aware personalized mobile applications for various purposes in our real world life.

DBMar 19, 2018
Mining User Behavioral Rules from Smartphone Data through Association Analysis

Iqbal H. Sarker, Flora D. Salim

The increasing popularity of smart mobile phones and their powerful sensing capabilities have enabled the collection of rich contextual information and mobile phone usage records through the device logs. This paper formulates the problem of mining behavioral association rules of individual mobile phone users utilizing their smartphone data. Association rule learning is the most popular technique to discover rules utilizing large datasets. However, it is well-known that a large proportion of association rules generated are redundant. This redundant production makes not only the rule-set unnecessarily large but also makes the decision making process more complex and ineffective. In this paper, we propose an approach that effectively identifies the redundancy in associations and extracts a concise set of behavioral association rules that are non-redundant. The effectiveness of the proposed approach is examined by considering the real mobile phone datasets of individual users.

LGOct 12, 2017
An Improved Naive Bayes Classifier-based Noise Detection Technique for Classifying User Phone Call Behavior

Iqbal H. Sarker, Muhammad Ashad Kabir, Alan Colman et al.

The presence of noisy instances in mobile phone data is a fundamental issue for classifying user phone call behavior (i.e., accept, reject, missed and outgoing), with many potential negative consequences. The classification accuracy may decrease and the complexity of the classifiers may increase due to the number of redundant training samples. To detect such noisy instances from a training dataset, researchers use naive Bayes classifier (NBC) as it identifies misclassified instances by taking into account independence assumption and conditional probabilities of the attributes. However, some of these misclassified instances might indicate usages behavioral patterns of individual mobile phone users. Existing naive Bayes classifier based noise detection techniques have not considered this issue and, thus, are lacking in classification accuracy. In this paper, we propose an improved noise detection technique based on naive Bayes classifier for effectively classifying users' phone call behaviors. In order to improve the classification accuracy, we effectively identify noisy instances from the training dataset by analyzing the behavioral patterns of individuals. We dynamically determine a noise threshold according to individual's unique behavioral patterns by using both the naive Bayes classifier and Laplace estimator. We use this noise threshold to identify noisy instances. To measure the effectiveness of our technique in classifying user phone call behavior, we employ the most popular classification algorithm (e.g., decision tree). Experimental results on the real phone call log dataset show that our proposed technique more accurately identifies the noisy instances from the training datasets that leads to better classification accuracy.