CRLGSEJul 29, 2022

Effectiveness of Transformer Models on IoT Security Detection in StackOverflow Discussions

arXiv:2207.14542v1h-index: 25
Originality Synthesis-oriented
AI Analysis

This work addresses IoT security detection for developers and vendors by providing a domain-specific dataset and model, but it is incremental as it applies existing transformer methods to a new dataset.

The paper tackles the problem of detecting IoT security discussions in StackOverflow by creating a manually labeled dataset of 7147 samples and using transformer models, finding that cross-domain transfer leads to a 44% performance loss and achieving an F1-Score of 0.69 with a domain-specific detector.

The Internet of Things (IoT) is an emerging concept that directly links to the billions of physical items, or "things", that are connected to the Internet and are all gathering and exchanging information between devices and systems. However, IoT devices were not built with security in mind, which might lead to security vulnerabilities in a multi-device system. Traditionally, we investigated IoT issues by polling IoT developers and specialists. This technique, however, is not scalable since surveying all IoT developers is not feasible. Another way to look into IoT issues is to look at IoT developer discussions on major online development forums like Stack Overflow (SO). However, finding discussions that are relevant to IoT issues is challenging since they are frequently not categorized with IoT-related terms. In this paper, we present the "IoT Security Dataset", a domain-specific dataset of 7147 samples focused solely on IoT security discussions. As there are no automated tools to label these samples, we manually labeled them. We further employed multiple transformer models to automatically detect security discussions. Through rigorous investigations, we found that IoT security discussions are different and more complex than traditional security discussions. We demonstrated a considerable performance loss (up to 44%) of transformer models on cross-domain datasets when we transferred knowledge from a general-purpose dataset "Opiner", supporting our claim. Thus, we built a domain-specific IoT security detector with an F1-Score of 0.69. We have made the dataset public in the hope that developers would learn more about the security discussion and vendors would enhance their concerns about product security.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes