LGJul 29, 2022
Design Methodology for Deep Out-of-Distribution Detectors in Real-Time Cyber-Physical SystemsMichael Yuhas, Daniel Jun Xian Ng, Arvind Easwaran
When machine learning (ML) models are supplied with data outside their training distribution, they are more likely to make inaccurate predictions; in a cyber-physical system (CPS), this could lead to catastrophic system failure. To mitigate this risk, an out-of-distribution (OOD) detector can run in parallel with an ML model and flag inputs that could lead to undesirable outcomes. Although OOD detectors have been well studied in terms of accuracy, there has been less focus on deployment to resource constrained CPSs. In this study, a design methodology is proposed to tune deep OOD detectors to meet the accuracy and response time requirements of embedded applications. The methodology uses genetic algorithms to optimize the detector's preprocessing pipeline and selects a quantization method that balances robustness and response time. It also identifies several candidate task graphs under the Robot Operating System (ROS) for deployment of the selected design. The methodology is demonstrated on two variational autoencoder based OOD detectors from the literature on two embedded platforms. Insights into the trade-offs that occur during the design process are provided, and it is shown that this design methodology can lead to a drastic reduction in response time in relation to an unoptimized OOD detector while maintaining comparable accuracy.
LGJul 25, 2021
Improving Variational Autoencoder based Out-of-Distribution Detection for Embedded Real-time ApplicationsYeli Feng, Daniel Jun Xian Ng, Arvind Easwaran
Uncertainties in machine learning are a significant roadblock for its application in safety-critical cyber-physical systems (CPS). One source of uncertainty arises from distribution shifts in the input data between training and test scenarios. Detecting such distribution shifts in real-time is an emerging approach to address the challenge. The high dimensional input space in CPS applications involving imaging adds extra difficulty to the task. Generative learning models are widely adopted for the task, namely out-of-distribution (OoD) detection. To improve the state-of-the-art, we studied existing proposals from both machine learning and CPS fields. In the latter, safety monitoring in real-time for autonomous driving agents has been a focus. Exploiting the spatiotemporal correlation of motion in videos, we can robustly detect hazardous motion around autonomous driving agents. Inspired by the latest advances in the Variational Autoencoder (VAE) theory and practice, we tapped into the prior knowledge in data to further boost OoD detection's robustness. Comparison studies over nuScenes and Synthia data sets show our methods significantly improve detection capabilities of OoD factors unique to driving scenarios, 42% better than state-of-the-art approaches. Our model also generalized near-perfectly, 97% better than the state-of-the-art across the real-world and simulation driving data sets experimented. Finally, we customized one proposed method into a twin-encoder model that can be deployed to resource limited embedded devices for real-time OoD detection. Its execution time was reduced over four times in low-precision 8-bit integer inference, while detection capability is comparable to its corresponding floating-point model.
ROJun 30, 2021
Embedded out-of-distribution detection on an autonomous robot platformMichael Yuhas, Yeli Feng, Daniel Jun Xian Ng et al.
Machine learning (ML) is actively finding its way into modern cyber-physical systems (CPS), many of which are safety-critical real-time systems. It is well known that ML outputs are not reliable when testing data are novel with regards to model training and validation data, i.e., out-of-distribution (OOD) test data. We implement an unsupervised deep neural network-based OOD detector on a real-time embedded autonomous Duckiebot and evaluate detection performance. Our OOD detector produces a success rate of 87.5% for emergency stopping a Duckiebot on a braking test bed we designed. We also provide case analysis on computing resource challenges specific to the Robot Operating System (ROS) middleware on the Duckiebot.
SEApr 23, 2021
Monitoring Cumulative Cost PropertiesOmar Al-Bataineh, Daniel Jun Xian Ng, Arvind Easwaran
This paper considers the problem of decentralized monitoring of a class of non-functional properties (NFPs) with quantitative operators, namely cumulative cost properties. The decentralized monitoring of NFPs can be a non-trivial task for several reasons: (i) they are typically expressed at a high abstraction level where inter-event dependencies are hidden, (ii) NFPs are difficult to be monitored in a decentralized way, and (iii) lack of effective decomposition techniques. We address these issues by providing a formal framework for decentralised monitoring of LTL formulas with quantitative operators. The presented framework employs the tableau construction and a formula unwinding technique (i.e., a transformation technique that preserves the semantics of the original formula) to split and distribute the input LTL formula and the corresponding quantitative constraint in a way such that monitoring can be performed in a decentralised manner. The employment of these techniques allows processes to detect early violations of monitored properties and perform some corrective or recovery actions. We demonstrate the effectiveness of the presented framework using a case study based on a Fischertechnik training model,a sorting line which sorts tokens based on their color into storage bins. The analysis of the case study shows the effectiveness of the presented framework not only in early detection of violations, but also in developing failure recovery plans that can help to avoid serious impact of failures on the performance of the system.
SYApr 13, 2020
Automatic Generation of Hierarchical Contracts for Resilience in Cyber-Physical SystemsZhiheng Xu, Daniel Jun Xian Ng, Arvind Easwaran
With the growing scale of Cyber-Physical Systems (CPSs), it is challenging to maintain their stability under all operating conditions. How to reduce the downtime and locate the failures becomes a core issue in system design. In this paper, we employ a hierarchical contract-based resilience framework to guarantee the stability of CPS. In this framework, we use Assume Guarantee (A-G) contracts to monitor the non-functional properties of individual components (e.g., power and latency), and hierarchically compose such contracts to deduce information about faults at the system level. The hierarchical contracts enable rapid fault detection in large-scale CPS. However, due to the vast number of components in CPS, manually designing numerous contracts and the hierarchy becomes challenging. To address this issue, we propose a technique to automatically decompose a root contract into multiple lower-level contracts depending on I/O dependencies between components. We then formulate a multi-objective optimization problem to search the optimal parameters of each lower-level contract. This enables automatic contract refinement taking into consideration the communication overhead between components. Finally, we use a case study from the manufacturing domain to experimentally demonstrate the benefits of the proposed framework.
SEApr 9, 2020
Demo Abstract: Contract-based Hierarchical Resilience Framework for Cyber-Physical SystemsDaniel Jun Xian Ng, Arvind Easwaran, Sidharta Andalam
This demonstration presents a framework for building a resilient Cyber-Physical Systems (CPS) cyber-infrastructure through the use of hierarchical parametric assume-guarantee contracts. A Fischertechnik Sorting Line with Color Detection training model is used to showcase our framework.
SEApr 9, 2020
CLAIR: A Contract-based Framework for Developing Resilient CPS ArchitecturesSidharta Andalam, Daniel Jun Xian Ng, Arvind Easwaran et al.
Industrial cyber-infrastructure is normally a multilayered architecture. The purpose of the layered architecture is to hide complexity and allow independent evolution of the layers. In this paper, we argue that this traditional strict layering results in poor transparency across layers affecting the ability to significantly improve resiliency. We propose a contract-based methodology where components across and within the layers of the cyber-infrastructure are associated with contracts and a light-weight resilience manager. This allows the system to detect faults (contract violation monitored using observers) and react (change contracts dynamically) effectively. It results in (1) improving transparency across layers; helps resiliency, (2) decoupling fault-handling code from application code; helps code maintenance, (3) systematically generate error-free fault handling code; reduces development time. Using an industrial case study, we demonstrate the proposed methodology.
SEApr 9, 2020
Contract-based Methodology for Developing Resilient Cyber-Infrastructure in the Industry 4.0 EraSidharta Andalam, Daniel Jun Xian Ng, Arvind Easwaran et al.
As the industrial cyber-infrastructure become increasingly important to realise the objectives of Industry~4.0, the consequence of disruption due to internal or external faults become increasingly severe. Thus there is a need for a resilient infrastructure. In this paper, we propose a contract-based methodology where components across layers of the cyber-infrastructure are associated with contracts and a light-weight resilience manager. This allows the system to detect faults (contract violation monitored using observers) and react (change contracts dynamically) effectively.
SEApr 9, 2020
Contract-based Hierarchical Resilience Management for Cyber-Physical SystemsMohammad Shihabul Haque, Daniel Jun Xian Ng, Arvind Easwaran et al.
Orchestrated collaborative effort of physical and cyber components to satisfy given requirements is the central concept behind Cyber-Physical Systems (CPS). To duly ensure the performance of components, a software-based resilience manager is a flexible choice to detect and recover from faults quickly. However, a single resilience manager, placed at the centre of the system to deal with every fault, suffers from decision-making overburden; and therefore, is out of the question for distributed large-scale CPS. On the other hand, prompt detection of failures and efficient recovery from them are challenging for decentralised resilience managers. In this regard, we present a novel resilience management framework that utilises the concept of management hierarchy. System design contracts play a key role in this framework for prompt fault-detection and recovery. Besides the details of the framework, an Industry 4.0 related test case is presented in this article to provide further insights.