Machine Learning Based IoT Intrusion Detection System: An MQTT Case Study (MQTT-IoT-IDS2020 Dataset)
This work addresses a gap in cybersecurity for IoT researchers and practitioners by providing a dataset and baseline results, though it is incremental as it applies existing ML methods to a new protocol-specific context.
The paper tackles the lack of intrusion detection datasets for MQTT-based IoT networks by generating and releasing a simulated dataset, and evaluates six machine learning techniques, finding that flow-based features are effective for detecting MQTT attacks with adequate performance.
The Internet of Things (IoT) is one of the main research fields in the Cybersecurity domain. This is due to (a) the increased dependency on automated device, and (b) the inadequacy of general purpose Intrusion Detection Systems (IDS) to be deployed for special purpose networks usage. Numerous lightweight protocols are being proposed for IoT devices communication usage. One of the distinguishable IoT machine-to-machine communication protocols is Message Queuing Telemetry Transport (MQTT) protocol. However, as per the authors best knowledge, there are no available IDS datasets that include MQTT benign or attack instances and thus, no IDS experimental results available. In this paper, the effectiveness of six Machine Learning (ML) techniques to detect MQTT-based attacks is evaluated. Three abstraction levels of features are assessed, namely, packet-based, unidirectional flow, and bidirectional flow features. An MQTT simulated dataset is generated and used for the training and evaluation processes. The dataset is released with an open access licence to help the research community further analyse the accompanied challenges. The experimental results demonstrated the adequacy of the proposed ML models to suit MQTT-based networks IDS requirements. Moreover, the results emphasise on the importance of using flow-based features to discriminate MQTT-based attacks from benign traffic, while packet-based features are sufficient for traditional networking attacks.