Bayesian Deployment Approval for Learned Landing Controllers under Finite Rollout Validation
It addresses the problem of statistically rigorous deployment validation for learned controllers in safety-critical autonomous systems, where empirical metrics can be overconfident.
This work develops a Bayesian approval framework for learned autonomous landing controllers, using posterior inference to quantify deployment readiness under finite rollout evidence. The framework provides more uncertainty-calibrated assessments than empirical success frequency, as demonstrated with PPO and SAC controllers.
Reinforcement learning and data-driven autonomous controllers are commonly evaluated using cumulative reward and empirical success frequency under finite simulation trajectories. However, such empirical metrics do not necessarily provide sufficient statistical evidence regarding deployment readiness under uncertainty. This work develops a Bayesian approval framework for learned autonomous landing controllers under finite rollout evidence. A probabilistic landing capability formulation is introduced based on touchdown safety satisfaction under uncertain operating conditions, while Bayesian posterior inference is used to quantify uncertainty regarding the true deployment capability of learned policies. Posterior approval probability and posterior deployment risk are further introduced for deployment-oriented evaluation, together with a sequential validation framework supporting approve/reject/continue decisions during progressive rollout testing. Simulation experiments using PPO and SAC controllers demonstrate that empirical success and reward optimization may produce overconfident deployment interpretation under limited validation evidence, whereas posterior approval inference provides a more uncertainty-calibrated assessment of deployment readiness. The proposed framework provides a practical statistical connection between conventional reinforcement-learning evaluation and deployment-oriented validation under uncertainty and may be generalized to broader classes of learned autonomous systems.