PLSEMay 11

CPSLint: A Domain-Specific Language Providing Data Validation and Sanitisation for Industrial Cyber-Physical Systems

arXiv:2510.186517.9h-index: 30
Predicted impact top 80% in PL · last 90 daysOriginality Synthesis-oriented
AI Analysis

It addresses the problem of data preprocessing for non-programming domain experts in industrial CPS, but the evaluation is limited to a single dataset and lacks comparison to baselines, making it incremental.

CPSLint is a domain-specific language for data validation and sanitisation in industrial cyber-physical systems, enabling automatic detection and correction of common data corruption patterns with reduced manual effort and guaranteed consistency.

Industrial cyber-physical systems generate vast amounts of semi-structured time-series data that require careful preprocessing before they can be effectively used for machine learning applications such as fault detection and identification. Raw sensor datasets are often corrupted or incomplete, making it challenging to develop reliable solutions without proper data preparation and validation. In this paper, we introduce CPSLint, a domain-specific language for data validation and sanitisation. We present the design, implementation and evaluation of CPSLint, demonstrating its ability to automatically detect and correct common data corruption patterns while enabling non-programming domain experts to effectively prepare their data for analysis. We report evaluation results on a representative dataset, tracking memory consumption and CPU-time for sanitisation activities. Our approach offers several advantages over traditional methods, including reduced manual effort, guaranteed consistency and broader applicability across time-series datasets and projects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes