AIOct 11, 2023
Give and Take: Federated Transfer Learning for Industrial IoT Network Intrusion DetectionLochana Telugu Rajesh, Tapadhir Das, Raj Mani Shukla et al.
The rapid growth in Internet of Things (IoT) technology has become an integral part of today's industries forming the Industrial IoT (IIoT) initiative, where industries are leveraging IoT to improve communication and connectivity via emerging solutions like data analytics and cloud computing. Unfortunately, the rapid use of IoT has made it an attractive target for cybercriminals. Therefore, protecting these systems is of utmost importance. In this paper, we propose a federated transfer learning (FTL) approach to perform IIoT network intrusion detection. As part of the research, we also propose a combinational neural network as the centerpiece for performing FTL. The proposed technique splits IoT data between the client and server devices to generate corresponding models, and the weights of the client models are combined to update the server model. Results showcase high performance for the FTL setup between iterations on both the IIoT clients and the server. Additionally, the proposed FTL setup achieves better overall performance than contemporary machine learning algorithms at performing network intrusion detection.
LGJan 26
Benchmarking Machine Learning Models for IoT Malware Detection under Data Scarcity and DriftJake Lyon, Ehsan Saeedizade, Shamik Sengupta
The rapid expansion of the Internet of Things (IoT) in domains such as smart cities, transportation, and industrial systems has heightened the urgency of addressing their security vulnerabilities. IoT devices often operate under limited computational resources, lack robust physical safeguards, and are deployed in heterogeneous and dynamic networks, making them prime targets for cyberattacks and malware applications. Machine learning (ML) offers a promising approach to automated malware detection and classification, but practical deployment requires models that are both effective and lightweight. The goal of this study is to investigate the effectiveness of four supervised learning models (Random Forest, LightGBM, Logistic Regression, and a Multi-Layer Perceptron) for malware detection and classification using the IoT-23 dataset. We evaluate model performance in both binary and multiclass classification tasks, assess sensitivity to training data volume, and analyze temporal robustness to simulate deployment in evolving threat landscapes. Our results show that tree-based models achieve high accuracy and generalization, even with limited training data, while performance deteriorates over time as malware diversity increases. These findings underscore the importance of adaptive, resource-efficient ML models for securing IoT systems in real-world environments.
NIFeb 12, 2024
Locality Sensitive Hashing for Network Traffic FingerprintingNowfel Mashnoor, Jay Thom, Abdur Rouf et al.
The advent of the Internet of Things (IoT) has brought forth additional intricacies and difficulties to computer networks. These gadgets are particularly susceptible to cyber-attacks because of their simplistic design. Therefore, it is crucial to recognise these devices inside a network for the purpose of network administration and to identify any harmful actions. Network traffic fingerprinting is a crucial technique for identifying devices and detecting anomalies. Currently, the predominant methods for this depend heavily on machine learning (ML). Nevertheless, machine learning (ML) methods need the selection of features, adjustment of hyperparameters, and retraining of models to attain optimal outcomes and provide resilience to concept drifts detected in a network. In this research, we suggest using locality-sensitive hashing (LSH) for network traffic fingerprinting as a solution to these difficulties. Our study focuses on examining several design options for the Nilsimsa LSH function. We then use this function to create unique fingerprints for network data, which may be used to identify devices. We also compared it with ML-based traffic fingerprinting and observed that our method increases the accuracy of state-of-the-art by 12% achieving around 94% accuracy in identifying devices in a network.
CRAug 27, 2021
Modeling and Analyzing Attacker Behavior in IoT Botnet using Temporal Convolution Network (TCN)Farhan Sadique, Shamik Sengupta
Traditional reactive approach of blacklisting botnets fails to adapt to the rapidly evolving landscape of cyberattacks. An automated and proactive approach to detect and block botnet hosts will immensely benefit the industry. Behavioral analysis of attackers is shown to be effective against a wide variety of attack types. Previous works, however, focus solely on anomalies in network traffic to detect bots and botnet. In this work we take a more robust approach of analyzing the heterogeneous events including network traffic, file download events, SSH logins and chain of commands input by attackers in a compromised host. We have deployed several honeypots to simulate Linux shells and allowed attackers access to the shells. We have collected a large dataset of heterogeneous threat events from the honeypots. We have then combined and modeled the heterogeneous threat data to analyze attacker behavior. Then we have used a deep learning architecture called a Temporal Convolutional Network (TCN) to do sequential and predictive analysis on the data. A prediction accuracy of $85-97\%$ validates our data model as well as our analysis methodology. In this work, we have also developed an automated mechanism to collect and analyze these data. For the automation we have used CYbersecurity information Exchange (CYBEX). Finally, we have compared TCN with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) and have showed that TCN outperforms LSTM and GRU for the task at hand.
CRAug 4, 2021
Semi-supervised Conditional GAN for Simultaneous Generation and Detection of Phishing URLs: A Game theoretic PerspectiveSharif Amit Kamran, Shamik Sengupta, Alireza Tavakkoli
Spear Phishing is a type of cyber-attack where the attacker sends hyperlinks through email on well-researched targets. The objective is to obtain sensitive information by imitating oneself as a trustworthy website. In recent times, deep learning has become the standard for defending against such attacks. However, these architectures were designed with only defense in mind. Moreover, the attacker's perspective and motivation are absent while creating such models. To address this, we need a game-theoretic approach to understand the perspective of the attacker (Hacker) and the defender (Phishing URL detector). We propose a Conditional Generative Adversarial Network with novel training strategy for real-time phishing URL detection. Additionally, we train our architecture in a semi-supervised manner to distinguish between adversarial and real examples, along with detecting malicious and benign URLs. We also design two games between the attacker and defender in training and deployment settings by utilizing the game-theoretic perspective. Our experiments confirm that the proposed architecture surpasses recent state-of-the-art architectures for phishing URLs detection.
CRJun 8, 2021
Analysis of Attacker Behavior in Compromised Hosts During Command and ControlFarhan Sadique, Shamik Sengupta
Traditional reactive approach of blacklisting botnets fails to adapt to the rapidly evolving landscape of cyberattacks. An automated and proactive approach to detect and block botnet hosts will immensely benefit the industry. Behavioral analysis of botnet is shown to be effective against a wide variety of attack types. Current works, however, focus solely on analyzing network traffic from and to the bots. In this work we take a different approach of analyzing the chain of commands input by attackers in a compromised host. We have deployed several honeypots to simulate Linux shells and allowed attackers access to the shells to collect a large dataset of commands. We have further developed an automated mechanism to analyze these data. For the automation we have developed a system called CYbersecurity information Exchange with Privacy (CYBEX-P). Finally, we have done a sequential analysis on the dataset to show that we can successfully predict attacker behavior from the shell commands without analyzing network traffic like previous works.
CRJun 3, 2021
Cybersecurity Information Exchange with Privacy (CYBEX-P) and TAHOE -- A Cyberthreat LanguageFarhan Sadique, Ignacio Astaburuaga, Raghav Kaul et al.
Cybersecurity information sharing (CIS) is envisioned to protect organizations more effectively from advanced cyber attacks. However, a completely automated CIS platform is not widely adopted. The major challenges are: (1) the absence of a robust cyber threat language (CTL) and (2) the concerns over data privacy. This work introduces Cybersecurity Information Exchangewith Privacy (CYBEX-P), as a CIS framework, to tackle these challenges. CYBEX-P allows organizations to share heterogeneous data with granular, attribute based privacy control. It correlates the data to automatically generate intuitive reports and defensive rules. To achieve such versatility, we have developed TAHOE - a graph based CTL. TAHOE is a structure for storing,sharing and analyzing threat data. It also intrinsically correlates the data. We have further developed a universal Threat Data Query Language (TDQL). In this paper, we propose the system architecture for CYBEX-P. We then discuss its scalability and privacy features along with a use case of CYBEX-P providing Infrastructure as a Service (IaaS). We further introduce TAHOE& TDQL as better alternatives to existing CTLs and formulate ThreatRank - an algorithm to detect new malicious even
GTJan 29, 2021
Finding the Sweet Spot for Data Anonymization: A Mechanism Design PerspectiveAbdelrahman Eldosouky, Tapadhir Das, Anuraag Kotra et al.
Data sharing between different organizations is an essential process in today's connected world. However, recently there were many concerns about data sharing as sharing sensitive information can jeopardize users' privacy. To preserve the privacy, organizations use anonymization techniques to conceal users' sensitive data. However, these techniques are vulnerable to de-anonymization attacks which aim to identify individual records within a dataset. In this paper, a two-tier mathematical framework is proposed for analyzing and mitigating the de-anonymization attacks, by studying the interactions between sharing organizations, data collector, and a prospective attacker. In the first level, a game-theoretic model is proposed to enable sharing organizations to optimally select their anonymization levels for k-anonymization under two potential attacks: background-knowledge attack and homogeneity attack. In the second level, a contract-theoretic model is proposed to enable the data collector to optimally reward the organizations for their data. The formulated problems are studied under single-time sharing and repeated sharing scenarios. Different Nash equilibria for the proposed game and the optimal solution of the contract-based problem are analytically derived for both scenarios. Simulation results show that the organizations can optimally select their anonymization levels, while the data collector can benefit from incentivizing the organizations to share their data.