Anwar Haque

LG
h-index1
15papers
129citations
Novelty35%
AI Score25

15 Papers

CRJan 5, 2023
DRL-GAN: A Hybrid Approach for Binary and Multiclass Network Intrusion Detection

Caroline Strickland, Chandrika Saha, Muhammad Zakar et al.

Our increasingly connected world continues to face an ever-growing amount of network-based attacks. Intrusion detection systems (IDS) are an essential security technology for detecting these attacks. Although numerous machine learning-based IDS have been proposed for the detection of malicious network traffic, the majority have difficulty properly detecting and classifying the more uncommon attack types. In this paper, we implement a novel hybrid technique using synthetic data produced by a Generative Adversarial Network (GAN) to use as input for training a Deep Reinforcement Learning (DRL) model. Our GAN model is trained with the NSL-KDD dataset for four attack categories as well as normal network flow. Ultimately, our findings demonstrate that training the DRL on specific synthetic datasets can result in better performance in correctly classifying minority classes over training on the true imbalanced dataset.

LGSep 20, 2024
An Adaptive End-to-End IoT Security Framework Using Explainable AI and LLMs

Sudipto Baral, Sajal Saha, Anwar Haque

The exponential growth of the Internet of Things (IoT) has significantly increased the complexity and volume of cybersecurity threats, necessitating the development of advanced, scalable, and interpretable security frameworks. This paper presents an innovative, comprehensive framework for real-time IoT attack detection and response that leverages Machine Learning (ML), Explainable AI (XAI), and Large Language Models (LLM). By integrating XAI techniques such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) with a model-independent architecture, we ensure our framework's adaptability across various ML algorithms. Additionally, the incorporation of LLMs enhances the interpretability and accessibility of detection decisions, providing system administrators with actionable, human-understandable explanations of detected threats. Our end-to-end framework not only facilitates a seamless transition from model development to deployment but also represents a real-world application capability that is often lacking in existing research. Based on our experiments with the CIC-IOT-2023 dataset \cite{neto2023ciciot2023}, Gemini and OPENAI LLMS demonstrate unique strengths in attack mitigation: Gemini offers precise, focused strategies, while OPENAI provides extensive, in-depth security measures. Incorporating SHAP and LIME algorithms within XAI provides comprehensive insights into attack detection, emphasizing opportunities for model improvement through detailed feature analysis, fine-tuning, and the adaptation of misclassifications to enhance accuracy.

LGMay 3, 2022
Deep Sequence Modeling for Anomalous ISP Traffic Prediction

Sajal Saha, Anwar Haque, Greg Sidebottom

Internet traffic in the real world is susceptible to various external and internal factors which may abruptly change the normal traffic flow. Those unexpected changes are considered outliers in traffic. However, deep sequence models have been used to predict complex IP traffic, but their comparative performance for anomalous traffic has not been studied extensively. In this paper, we investigated and evaluated the performance of different deep sequence models for anomalous traffic prediction. Several deep sequences models were implemented to predict real traffic without and with outliers and show the significance of outlier detection in real-world traffic prediction. First, two different outlier detection techniques, such as the Three-Sigma rule and Isolation Forest, were applied to identify the anomaly. Second, we adjusted those abnormal data points using the Backward Filling technique before training the model. Finally, the performance of different models was compared for abnormal and adjusted traffic. LSTM_Encoder_Decoder (LSTM_En_De) is the best prediction model in our experiment, reducing the deviation between actual and predicted traffic by more than 11\% after adjusting the outliers. All other models, including Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), LSTM_En_De with Attention layer (LSTM_En_De_Atn), Gated Recurrent Unit (GRU), show better prediction after replacing the outliers and decreasing prediction error by more than 29%, 24%, 19%, and 10% respectively. Our experimental results indicate that the outliers in the data can significantly impact the quality of the prediction. Thus, outlier detection and mitigation assist the deep sequence model in learning the general trend and making better predictions.

NIMay 3, 2022
An Empirical Study on Internet Traffic Prediction Using Statistical Rolling Model

Sajal Saha, Anwar Haque, Greg Sidebottom

Real-world IP network traffic is susceptible to external and internal factors such as new internet service integration, traffic migration, internet application, etc. Due to these factors, the actual internet traffic is non-linear and challenging to analyze using a statistical model for future prediction. In this paper, we investigated and evaluated the performance of different statistical prediction models for real IP network traffic; and showed a significant improvement in prediction using the rolling prediction technique. Initially, a set of best hyper-parameters for the corresponding prediction model is identified by analyzing the traffic characteristics and implementing a grid search algorithm based on the minimum Akaike Information Criterion (AIC). Then, we performed a comparative performance analysis among AutoRegressive Integrated Moving Average (ARIMA), Seasonal ARIMA (SARIMA), SARIMA with eXogenous factors (SARIMAX), and Holt-Winter for single-step prediction. The seasonality of our traffic has been explicitly modeled using SARIMA, which reduces the rolling prediction Mean Average Percentage Error (MAPE) by more than 4% compared to ARIMA (incapable of handling the seasonality). We further improved traffic prediction using SARIMAX to learn different exogenous factors extracted from the original traffic, which yielded the best rolling prediction results with a MAPE of 6.83%. Finally, we applied the exponential smoothing technique to handle the variability in traffic following the Holt-Winter model, which exhibited a better prediction than ARIMA (around 1.5% less MAPE). The rolling prediction technique reduced prediction error using real Internet Service Provider (ISP) traffic data by more than 50\% compared to the standard prediction method.

LGMay 9, 2022
Transfer Learning Based Efficient Traffic Prediction with Limited Training Data

Sajal Saha, Anwar Haque, Greg Sidebottom

Efficient prediction of internet traffic is an essential part of Self Organizing Network (SON) for ensuring proactive management. There are many existing solutions for internet traffic prediction with higher accuracy using deep learning. But designing individual predictive models for each service provider in the network is challenging due to data heterogeneity, scarcity, and abnormality. Moreover, the performance of the deep sequence model in network traffic prediction with limited training data has not been studied extensively in the current works. In this paper, we investigated and evaluated the performance of the deep transfer learning technique in traffic prediction with inadequate historical data leveraging the knowledge of our pre-trained model. First, we used a comparatively larger real-world traffic dataset for source domain prediction based on five different deep sequence models: Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), LSTM Encoder-Decoder (LSTM_En_De), LSTM_En_De with Attention layer (LSTM_En_De_Atn), and Gated Recurrent Unit (GRU). Then, two best-performing models, LSTM_En_De and LSTM_En_De_Atn, from the source domain with an accuracy of 96.06% and 96.05% are considered for the target domain prediction. Finally, four smaller traffic datasets collected for four particular sources and destination pairs are used in the target domain to compare the performance of the standard learning and transfer learning in terms of accuracy and execution time. According to our experimental result, transfer learning helps to reduce the execution time for most cases, while the model's accuracy is improved in transfer learning with a larger training session.

LGMay 9, 2022
Wavelet-Based Hybrid Machine Learning Model for Out-of-distribution Internet Traffic Prediction

Sajal Saha, Anwar Haque, Greg Sidebottom

Efficient prediction of internet traffic is essential for ensuring proactive management of computer networks. Nowadays, machine learning approaches show promising performance in modeling real-world complex traffic. However, most existing works assumed that model training and evaluation data came from identical distribution. But in practice, there is a high probability that the model will deal with data from a slightly or entirely unknown distribution in the deployment phase. This paper investigated and evaluated machine learning performances using eXtreme Gradient Boosting, Light Gradient Boosting Machine, Stochastic Gradient Descent, Gradient Boosting Regressor, CatBoost Regressor, and their stacked ensemble model using data from both identical and out-of distribution. Also, we proposed a hybrid machine learning model integrating wavelet decomposition for improving out-of-distribution prediction as standalone models were unable to generalize very well. Our experimental results show the best performance of the standalone ensemble model with an accuracy of 96.4%, while the hybrid ensemble model improved it by 1% for in-distribution data. But its performance dropped significantly when tested with three different datasets having a distribution shift than the training set. However, our proposed hybrid model considerably reduces the performance gap between identical and out-of-distribution evaluation compared with the standalone model, indicating the decomposition technique's effectiveness in the case of out-of-distribution generalization.

LGMay 3, 2022
Towards an Ensemble Regressor Model for Anomalous ISP Traffic Prediction

Sajal Saha, Anwar Haque, Greg Sidebottom

Prediction of network traffic behavior is significant for the effective management of modern telecommunication networks. However, the intuitive approach of predicting network traffic using administrative experience and market analysis data is inadequate for an efficient forecast framework. As a result, many different mathematical models have been studied to capture the general trend of the network traffic and predict accordingly. But the comprehensive performance analysis of varying regression models and their ensemble has not been studied before for analyzing real-world anomalous traffic. In this paper, several regression models such as Extra Gradient Boost (XGBoost), Light Gradient Boosting Machine (LightGBM), Stochastic Gradient Descent (SGD), Gradient Boosting Regressor (GBR), and CatBoost Regressor were analyzed to predict real traffic without and with outliers and show the significance of outlier detection in real-world traffic prediction. Also, we showed the outperformance of the ensemble regression model over the individual prediction model. We compared the performance of different regression models based on five different feature sets of lengths 6, 9, 12, 15, and 18. Our ensemble regression model achieved the minimum average gap of 5.04% between actual and predicted traffic with nine outlier-adjusted inputs. In general, our experimental results indicate that the outliers in the data can significantly impact the quality of the prediction. Thus, outlier detection and mitigation assist the regression model in learning the general trend and making better predictions.

NISep 23, 2024
Intelligent Routing Algorithm over SDN: Reusable Reinforcement Learning Approach

Wang Wumian, Sajal Saha, Anwar Haque et al.

Traffic routing is vital for the proper functioning of the Internet. As users and network traffic increase, researchers try to develop adaptive and intelligent routing algorithms that can fulfill various QoS requirements. Reinforcement Learning (RL) based routing algorithms have shown better performance than traditional approaches. We developed a QoS-aware, reusable RL routing algorithm, RLSR-Routing over SDN. During the learning process, our algorithm ensures loop-free path exploration. While finding the path for one traffic demand (a source destination pair with certain amount of traffic), RLSR-Routing learns the overall network QoS status, which can be used to speed up algorithm convergence when finding the path for other traffic demands. By adapting Segment Routing, our algorithm can achieve flow-based, source packet routing, and reduce communications required between SDN controller and network plane. Our algorithm shows better performance in terms of load balancing than the traditional approaches. It also has faster convergence than the non-reusable RL approach when finding paths for multiple traffic demands.

LGSep 20, 2024
Overcoming Data Limitations in Internet Traffic Forecasting: LSTM Models with Transfer Learning and Wavelet Augmentation

Sajal Saha, Anwar Haque, Greg Sidebottom

Effective internet traffic prediction in smaller ISP networks is challenged by limited data availability. This paper explores this issue using transfer learning and data augmentation techniques with two LSTM-based models, LSTMSeq2Seq and LSTMSeq2SeqAtn, initially trained on a comprehensive dataset provided by Juniper Networks and subsequently applied to smaller datasets. The datasets represent real internet traffic telemetry, offering insights into diverse traffic patterns across different network domains. Our study revealed that while both models performed well in single-step predictions, multi-step forecasts were challenging, particularly in terms of long-term accuracy. In smaller datasets, LSTMSeq2Seq generally outperformed LSTMSeq2SeqAtn, indicating that higher model complexity does not necessarily translate to better performance. The models' effectiveness varied across different network domains, reflecting the influence of distinct traffic characteristics. To address data scarcity, Discrete Wavelet Transform was used for data augmentation, leading to significant improvements in model performance, especially in shorter-term forecasts. Our analysis showed that data augmentation is crucial in scenarios with limited data. Additionally, the study included an analysis of the models' variability and consistency, with attention mechanisms in LSTMSeq2SeqAtn providing better short-term forecasting consistency but greater variability in longer forecasts. The results highlight the benefits and limitations of different modeling approaches in traffic prediction. Overall, this research underscores the importance of transfer learning and data augmentation in enhancing the accuracy of traffic prediction models, particularly in smaller ISP networks with limited data availability.

LGJun 6, 2023
DEK-Forecaster: A Novel Deep Learning Model Integrated with EMD-KNN for Traffic Prediction

Sajal Saha, Sudipto Baral, Anwar Haque

Internet traffic volume estimation has a significant impact on the business policies of the ISP (Internet Service Provider) industry and business successions. Forecasting the internet traffic demand helps to shed light on the future traffic trend, which is often helpful for ISPs decision-making in network planning activities and investments. Besides, the capability to understand future trend contributes to managing regular and long-term operations. This study aims to predict the network traffic volume demand using deep sequence methods that incorporate Empirical Mode Decomposition (EMD) based noise reduction, Empirical rule based outlier detection, and $K$-Nearest Neighbour (KNN) based outlier mitigation. In contrast to the former studies, the proposed model does not rely on a particular EMD decomposed component called Intrinsic Mode Function (IMF) for signal denoising. In our proposed traffic prediction model, we used an average of all IMFs components for signal denoising. Moreover, the abnormal data points are replaced by $K$ nearest data points average, and the value for $K$ has been optimized based on the KNN regressor prediction error measured in Root Mean Squared Error (RMSE). Finally, we selected the best time-lagged feature subset for our prediction model based on AutoRegressive Integrated Moving Average (ARIMA) and Akaike Information Criterion (AIC) value. Our experiments are conducted on real-world internet traffic datasets from industry, and the proposed method is compared with various traditional deep sequence baseline models. Our results show that the proposed EMD-KNN integrated prediction models outperform comparative models.

CVNov 13, 2024
Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network

Sareh Soltani Nejad, Anwar Haque

The widespread implementation of urban surveillance systems has necessitated more sophisticated techniques for anomaly detection to ensure enhanced public safety. This paper presents a significant advancement in the field of anomaly detection through the application of Two-Stream Inflated 3D (I3D) Convolutional Networks. These networks substantially outperform traditional 3D Convolutional Networks (C3D) by more effectively extracting spatial and temporal features from surveillance videos, thus improving the precision of anomaly detection. Our research advances the field by implementing a weakly supervised learning framework based on Multiple Instance Learning (MIL), which uniquely conceptualizes surveillance videos as collections of 'bags' that contain instances (video clips). Each instance is innovatively processed through a ranking mechanism that prioritizes clips based on their potential to display anomalies. This novel strategy not only enhances the accuracy and precision of anomaly detection but also significantly diminishes the dependency on extensive manual annotations. Moreover, through meticulous optimization of model settings, including the choice of optimizer, our approach not only establishes new benchmarks in the performance of anomaly detection systems but also offers a scalable and efficient solution for real-world surveillance applications. This paper contributes significantly to the field of computer vision by delivering a more adaptable, efficient, and context-aware anomaly detection system, which is poised to redefine practices in urban surveillance.

LGJan 25, 2021
Appliance Operation Modes Identification Using Cycles Clustering

Abdelkareem Jaradat, Hanan Lutfiyya, Anwar Haque

The increasing cost, energy demand, and environmental issues has led many researchers to find approaches for energy monitoring, and hence energy conservation. The emerging technologies of Internet of Things (IoT) and Machine Learning (ML) deliver techniques that have the potential to efficiently conserve energy and improve the utilization of energy consumption. Smart Home Energy Management Systems (SHEMSs) have the potential to contribute in energy conservation through the application of Demand Response (DR) in the residential sector. In this paper, we propose appliances Operation Modes Identification using Cycles Clustering (OMICC) which is SHEMS fundamental approach that utilizes the sensed residential disaggregated power consumption in supporting DR by providing consumers the opportunity to select lighter appliance operation modes. The cycles of the Single Usage Profile (SUP) of an appliance are extracted and reformed into features in terms of clusters of cycles. These features are then used to identify the operation mode used in every occurrence using K-Nearest Neighbors (KNN). Operation modes identification is considered a basis for many potential smart DR applications within SHEMS towards the consumers or the suppliers

LGJan 4, 2021
Towards Network Traffic Monitoring Using Deep Transfer Learning

Harsh Dhillon, Anwar Haque

Network traffic is growing at an outpaced speed globally. The modern network infrastructure makes classic network intrusion detection methods inefficient to classify an inflow of vast network traffic. This paper aims to present a modern approach towards building a network intrusion detection system (NIDS) by using various deep learning methods. To further improve our proposed scheme and make it effective in real-world settings, we use deep transfer learning techniques where we transfer the knowledge learned by our model in a source domain with plentiful computational and data resources to a target domain with sparse availability of both the resources. Our proposed method achieved 98.30% classification accuracy score in the source domain and an improved 98.43% classification accuracy score in the target domain with a boost in the classification speed using UNSW-15 dataset. This study demonstrates that deep transfer learning techniques make it possible to construct large deep learning models to perform network classification, which can be deployed in the real world target domains where they can maintain their classification performance and improve their classification speed despite the limited accessibility of resources.

NISep 4, 2020
Machine Learning Towards Enabling Spectrum-as-a-Service Dynamic Sharing

Abdallah Moubayed, Tanveer Ahmed, Anwar Haque et al.

The growth in wireless broadband users, devices, and novel applications has led to a significant increase in the demand for new radio frequency spectrum. This is expected to grow even further given the projection that the global traffic per year will reach 4.8 zettabytes by 2022. Moreover, it is projected that the number of Internet users will reach 4.8 billion and the number of connected devices will be close 28.5 billion devices. However, due to the spectrum being mostly allocated and divided, providing more spectrum to expand existing services or offer new ones has become more challenging. To address this, spectrum sharing has been proposed as a potential solution to improve spectrum utilization efficiency. Adopting effective and efficient spectrum sharing mechanisms is in itself a challenging task given the multitude of levels and techniques that can be integrated to enable it. To that end, this paper provides an overview of the different spectrum sharing levels and techniques that have been proposed in the literature. Moreover, it discusses the potential of adopting dynamic sharing mechanisms by offering Spectrum-as-a-Service architecture. Furthermore, it describes the potential role of machine learning models in facilitating the automated and efficient dynamic sharing of the spectrum and offering Spectrum-as-a-Service.

SPAug 7, 2020
Demand Response For Residential Uses: A Data Analytics Approach

Abdelkareem Jaradat, Hanan Lutfiyya, Anwar Haque

In the Smart Grid environment, the advent of intelligent measuring devices facilitates monitoring appliance electricity consumption. This data can be used in applying Demand Response (DR) in residential houses through data analytics, and developing data mining techniques. In this research, we introduce a smart system foundation that is applied to user's disaggregated power consumption data. This system encourages the users to apply DR by changing their behaviour of using heavier operation modes to lighter modes, and by encouraging users to shift their usages to off-peak hours. First, we apply Cross Correlation (XCORR) to detect times of the occurrences when an appliance is being used. We then use The Dynamic Time Warping (DTW) to recognize the operation mode used.