Using Dynamic Binary Instrumentation to Detect Failures in Robotics Software
This addresses safety concerns for public-facing autonomous systems, but it is incremental as it builds on existing instrumentation and machine learning methods.
The paper tackled the problem of detecting software errors in safety-critical autonomous and robotics systems by proposing novel techniques that use dynamic binary instrumentation to collect low-level runtime signals and build machine learning models, demonstrating efficiency on ARDUPILOT and HUSKY in simulation with analysis of model accuracy, training data needs, overhead, and signal usefulness.
Autonomous and Robotics Systems (ARSs) are widespread, complex, and increasingly coming into contact with the public. Many of these systems are safety-critical, and it is vital to detect software errors to protect against harm. We propose a family of novel techniques to detect unusual program executions and incorrect program behavior. We model execution behavior by collecting low-level signals at run time and using those signals to build machine learning models. These models can identify previously-unseen executions that are more likely to exhibit errors. We describe a tractable approach for collecting dynamic binary runtime signals on ARSs, allowing the systems to absorb most of the overhead from dynamic instrumentation. The architecture of ARSs is particularly well-adapted to hiding the overhead from instrumentation. We demonstrate the efficiency of these approaches on ARDUPILOT -- a popular open-source autopilot software system -- and HUSKY -- an unmanned ground vehicle -- in simulation. We instrument executions to gather data from which we build supervised machine learning models of executions and evaluate the accuracy of these models. We also analyze the amount of training data needed to develop models with various degrees of accuracy, measure the overhead added to executions that use the analysis tool, and analyze which runtime signals are most useful for detecting unusual behavior on the program under test. In addition, we analyze the effects of timing delays on the functional behavior of ARSs.