CROct 17, 2023
IoTGeM: Generalizable Models for Behaviour-Based IoT Attack DetectionKahraman Kostas, Mike Just, Michael A. Lones
Previous research on behavior-based attack detection for networks of IoT devices has resulted in machine learning models whose ability to adapt to unseen data is limited and often not demonstrated. This paper presents IoTGeM, an approach for modeling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance. We first introduce an improved rolling window approach for feature extraction. To reduce overfitting, we then apply a multi-step feature selection process where a Genetic Algorithm (GA) is uniquely guided by exogenous feedback from a separate, independent dataset. To prevent common data leaks that have limited previous models, we build and test our models using strictly isolated train and test datasets. The resulting models are rigorously evaluated using a diverse portfolio of machine learning algorithms and datasets. Our window-based models demonstrate superior generalization compared to traditional flow-based models, particularly when tested on unseen datasets. On these stringent, cross-dataset tests, IoTGeM achieves F1 scores of 99\% for ACK, HTTP, SYN, MHD, and PS attacks, as well as a 94\% F1 score for UDP attacks. Finally, we build confidence in the models by using the SHAP (SHapley Additive exPlanations) explainable AI technique, allowing us to identify the specific features that underlie the accurate detection of attacks.
CRNov 5, 2024
GeMID: Generalizable Models for IoT Device IdentificationKahraman Kostas, Rabia Yasa Kostas, Mike Just et al.
With the proliferation of devices on the Internet of Things (IoT), ensuring their security has become paramount. Device identification (DI), which distinguishes IoT devices based on their traffic patterns, plays a crucial role in both differentiating devices and identifying vulnerable ones, closing a serious security gap. However, existing approaches to DI that build machine learning models often overlook the challenge of model generalizability across diverse network environments. In this study, we propose a novel framework to address this limitation and to evaluate the generalizability of DI models across data sets collected within different network environments. Our approach involves a two-step process: first, we develop a feature and model selection method that is more robust to generalization issues by using a genetic algorithm with external feedback and datasets from distinct environments to refine the selections. Second, the resulting DI models are then tested on further independent datasets to robustly assess their generalizability. We demonstrate the effectiveness of our method by empirically comparing it to alternatives, highlighting how fundamental limitations of commonly employed techniques such as sliding window and flow statistics limit their generalizability. Moreover, we show that statistical methods, widely used in the literature, are unreliable for device identification due to their dependence on network-specific characteristics rather than device-intrinsic properties, challenging the validity of a significant portion of existing research. Our findings advance research in IoT security and device identification, offering insight into improving model effectiveness and mitigating risks in IoT networks.
CRJun 7, 2024
Individual Packet Features are a Risk to Model Generalisation in ML-Based Intrusion DetectionKahraman Kostas, Mike Just, Michael A. Lones
Machine learning is increasingly used for intrusion detection in IoT networks. This paper explores the effectiveness of using individual packet features (IPF), which are attributes extracted from a single network packet, such as timing, size, and source-destination information. Through literature review and experiments, we identify the limitations of IPF, showing they can produce misleadingly high detection rates. Our findings emphasize the need for approaches that consider packet interactions for robust intrusion detection. Additionally, we demonstrate that models based on IPF often fail to generalize across datasets, compromising their reliability in diverse IoT environments.
CRFeb 17, 2021
IoTDevID: A Behavior-Based Device Identification Method for the IoTKahraman Kostas, Mike Just, Michael A. Lones
Device identification is one way to secure a network of IoT devices, whereby devices identified as suspicious can subsequently be isolated from a network. In this study, we present a machine learning-based method, IoTDevID, that recognizes devices through characteristics of their network packets. As a result of using a rigorous feature analysis and selection process, our study offers a generalizable and realistic approach to modelling device behavior, achieving high predictive accuracy across two public datasets. The model's underlying feature set is shown to be more predictive than existing feature sets used for device identification, and is shown to generalize to data unseen during the feature selection process. Unlike most existing approaches to IoT device identification, IoTDevID is able to detect devices using non-IP and low-energy protocols.
CRSep 29, 2020
Tracking Mixed BitcoinsTin Tironsakkul, Manuel Maarek, Andrea Eross et al.
Mixer services purportedly remove all connections between the input (deposited) Bitcoins and the output (withdrawn) mixed Bitcoins, seemingly rendering taint analysis tracking ineffectual. In this paper, we introduce and explore a novel tracking strategy, called \emph{Address Taint Analysis}, that adapts from existing transaction-based taint analysis techniques for tracking Bitcoins that have passed through a mixer service. We also investigate the potential of combining address taint analysis with address clustering and backward tainting. We further introduce a set of filtering criteria that reduce the number of false-positive results based on the characteristics of withdrawn transactions and evaluate our solution with verifiable mixing transactions of nine mixer services from previous reverse-engineering studies. Our finding shows that it is possible to track the mixed Bitcoins from the deposited Bitcoins using address taint analysis and the number of potential transaction outputs can be significantly reduced with the filtering criteria.
CRJun 13, 2019
Probing the Mystery of Cryptocurrency Theft: An Investigation into Methods for Taint AnalysisTin Tironsakkul, Manuel Maarek, Andrea Eross et al.
Since the creation of Bitcoin, transaction tracking is one of the prominent means for following the movement of Bitcoins involved in illegal activities. Although every Bitcoin transaction is recorded in the blockchain database, which is transparent for anyone to observe and analyse, Bitcoin's pseudonymity system and transaction obscuring techniques still allow criminals to disguise their transaction trail. While there have been a few attempts to develop tracking methods, there is no accepted evaluation method to measure their accuracy. Therefore, this paper investigates strategies for transaction tracking by introducing two new tainting methods, and proposes an address profiling approach with a metrics-based evaluation framework. We use our approach and framework to compare the accuracy of our new tainting methods with the previous tainting techniques, using data from two real Bitcoin theft transactions and several related control transactions.
CRJun 22, 2015
Proceedings of the Ninth Workshop on Web 2.0 Security and Privacy (W2SP) 2015Abigail Goldsteen, Tyrone Grandison, Mike Just et al.
This is the Proceedings of the Ninth Workshop on Web 2.0 Security and Privacy (W2SP) 2015, held in San Jose, CA, USA, on May 21, 2015. The workshop was held as part of the IEEE Computer Society Security and Privacy Workshops, in conjunction with the IEEE Symposium on Security and Privacy.
CROct 28, 2014
Data Driven Authentication: On the Effectiveness of User Behaviour Modelling with Mobile Device SensorsHilmi Gunes Kayacik, Mike Just, Lynne Baillie et al.
We propose a lightweight, and temporally and spatially aware user behaviour modelling technique for sensor-based authentication. Operating in the background, our data driven technique compares current behaviour with a user profile. If the behaviour deviates sufficiently from the established norm, actions such as explicit authentication can be triggered. To support a quick and lightweight deployment, our solution automatically switches from training mode to deployment mode when the user's behaviour is sufficiently learned. Furthermore, it allows the device to automatically determine a suitable detection threshold. We use our model to investigate practical aspects of sensor-based authentication by applying it to three publicly available data sets, computing expected times for training duration and behaviour drift. We also test our model with scenarios involving an attacker with varying knowledge and capabilities.
CROct 24, 2014
Proceedings of the Third Workshop on Mobile Security Technologies (MoST) 2014Larry Koved, Kapil Singh, Hao Chen et al.
This is the Proceedings of the Third Workshop on Mobile Security Technologies (MoST) 2014, held in San Jose, CA, USA, on May 17, 2014. The workshop was held as part of the IEEE Computer Society Security and Privacy Workshops, in conjunction with the IEEE Symposium on Security and Privacy.