Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data
This addresses the challenge of ensuring safe and reliable robot learning in novel settings, though it is incremental as it builds on existing PAC-Bayes theory.
The paper tackles the problem of learning robotic policies with generalization guarantees to unseen environments by combining a generative model with real-world data, resulting in stronger generalization guarantees demonstrated on simulated quadrotor navigation and grasping tasks, with hardware validation for grasping.
We are motivated by the problem of learning policies for robotic systems with rich sensory inputs (e.g., vision) in a manner that allows us to guarantee generalization to environments unseen during training. We provide a framework for providing such generalization guarantees by leveraging a finite dataset of real-world environments in combination with a (potentially inaccurate) generative model of environments. The key idea behind our approach is to utilize the generative model in order to implicitly specify a prior over policies. This prior is updated using the real-world dataset of environments by minimizing an upper bound on the expected cost across novel environments derived via Probably Approximately Correct (PAC)-Bayes generalization theory. We demonstrate our approach on two simulated systems with nonlinear/hybrid dynamics and rich sensing modalities: (i) quadrotor navigation with an onboard vision sensor, and (ii) grasping objects using a depth sensor. Comparisons with prior work demonstrate the ability of our approach to obtain stronger generalization guarantees by utilizing generative models. We also present hardware experiments for validating our bounds for the grasping task.