Software Fault Tolerance for Cyber-Physical Systems via Full System Restart
This work addresses the reliability of safety-critical embedded control systems by providing a method for fault tolerance via safe system restart, enabling the use of complex controllers like neural networks.
The paper proposes a formally verified controller that guarantees safety of nonlinear cyber-physical systems, enables safe full system restart during runtime, and allows use of complex unverified controllers. The approach was demonstrated on an inverted pendulum and a 3-DOF helicopter, showing safety under various faults.
The paper addresses the issue of reliability of complex embedded control systems in the safety-critical environment. In this paper, we propose a novel approach to design controller that (i) guarantees the safety of nonlinear physical systems, (ii) enables safe system restart during runtime, and (iii) allows the use of complex, unverified controllers (e.g., neural networks) that drive the physical systems towards complex specifications. We use abstraction-based controller synthesis approach to design a formally verified controller that provides application and system-level fault tolerance along with safety guarantee. Moreover, our approach is implementable using commercial-off-the-shelf (COTS) processing unit. To demonstrate the efficacy of our solution and to verify the safety of the system under various types of faults injected in applications and in the underlying real-time operating system (RTOS), we implemented the proposed controller for the inverted pendulum and three degree-of-freedom (3-DOF) helicopter.