On the Elements of Datasets for Cyber Physical Systems Security
This addresses dataset limitations for researchers in CPS security, but it is incremental as it builds on existing testbed-based approaches.
The paper tackles the problem of insufficient datasets for AI in Cyber Physical System (CPS) security by proposing a dataset architecture to capture system behavior and interactions, aiming to enhance AI algorithm performance.
Datasets are essential to apply AI algorithms to Cyber Physical System (CPS) Security. Due to scarcity of real CPS datasets, researchers elected to generate their own datasets using either real or virtualized testbeds. However, unlike other AI domains, a CPS is a complex system with many interfaces that determine its behavior. A dataset that comprises merely a collection of sensor measurements and network traffic may not be sufficient to develop resilient AI defensive or offensive agents. In this paper, we study the \emph{elements} of CPS security datasets required to capture the system behavior and interactions, and propose a dataset architecture that has the potential to enhance the performance of AI algorithms in securing cyber physical systems. The framework includes dataset elements, attack representation, and required dataset features. We compare existing datasets to the proposed architecture to identify the current limitations and discuss the future of CPS dataset generation using testbeds.