RT-HCP: Dealing with Inference Delays and Sample Efficiency to Learn Directly on Robotic Platforms
This work solves inference delay and sample efficiency problems for robotic control, but it is incremental as it builds on existing model-based RL methods.
The paper tackles the challenge of learning controllers directly on robots by addressing sample efficiency and inference delays, proposing RT-HCP which achieves a trade-off with experiments on a FURUTA pendulum platform.
Learning a controller directly on the robot requires extreme sample efficiency. Model-based reinforcement learning (RL) methods are the most sample efficient, but they often suffer from a too long inference time to meet the robot control frequency requirements. In this paper, we address the sample efficiency and inference time challenges with two contributions. First, we define a general framework to deal with inference delays where the slow inference robot controller provides a sequence of actions to feed the control-hungry robotic platform without execution gaps. Then, we compare several RL algorithms in the light of this framework and propose RT-HCP, an algorithm that offers an excellent trade-off between performance, sample efficiency and inference time. We validate the superiority of RT-HCP with experiments where we learn a controller directly on a simple but high frequency FURUTA pendulum platform. Code: github.com/elasriz/RTHCP