Sparse Gaussian Process Temporal Difference Learning for Marine Robot Navigation
This work addresses data efficiency and online applicability challenges for robots learning navigation in marine environments, representing an incremental improvement over existing sparse methods.
The paper tackles the problem of data-efficient Temporal Difference learning for marine robot navigation by reducing TD updates to Gaussian Process regression with a sparse approximation, resulting in SPGP-SARSA, which outperforms state-of-the-art sparse methods and replicates exact prediction quality in simulations and physical underwater trials.
We present a method for Temporal Difference (TD) learning that addresses several challenges faced by robots learning to navigate in a marine environment. For improved data efficiency, our method reduces TD updates to Gaussian Process regression. To make predictions amenable to online settings, we introduce a sparse approximation with improved quality over current rejection-based sparse methods. We derive the predictive value function posterior and use the moments to obtain a new algorithm for model-free policy evaluation, SPGP-SARSA. With simple changes, we show SPGP-SARSA can be reduced to a model-based equivalent, SPGP-TD. We perform comprehensive simulation studies and also conduct physical learning trials with an underwater robot. Our results show SPGP-SARSA can outperform the state-of-the-art sparse method, replicate the prediction quality of its exact counterpart, and be applied to solve underwater navigation tasks.