Adaptive Stress Testing of Trajectory Predictions in Flight Management Systems
This work addresses safety-critical issues for flight management system developers by identifying potential failures before deployment, though it is incremental as it extends existing stress testing methods to sequential decision-making problems.
The paper tackled the problem of finding failure events in flight-critical systems by applying an adaptive stress testing approach to a trajectory predictor in a commercial flight management system, resulting in the method discovering more failures and failures with higher likelihood compared to baseline approaches like Monte Carlo simulations and the cross-entropy method.
To find failure events and their likelihoods in flight-critical systems, we investigate the use of an advanced black-box stress testing approach called adaptive stress testing. We analyze a trajectory predictor from a developmental commercial flight management system which takes as input a collection of lateral waypoints and en-route environmental conditions. Our aim is to search for failure events relating to inconsistencies in the predicted lateral trajectories. The intention of this work is to find likely failures and report them back to the developers so they can address and potentially resolve shortcomings of the system before deployment. To improve search performance, this work extends the adaptive stress testing formulation to be applied more generally to sequential decision-making problems with episodic reward by collecting the state transitions during the search and evaluating at the end of the simulated rollout. We use a modified Monte Carlo tree search algorithm with progressive widening as our adversarial reinforcement learner. The performance is compared to direct Monte Carlo simulations and to the cross-entropy method as an alternative importance sampling baseline. The goal is to find potential problems otherwise not found by traditional requirements-based testing. Results indicate that our adaptive stress testing approach finds more failures and finds failures with higher likelihood relative to the baseline approaches.