A Framework for Learning from Demonstration with Minimal Human Effort
This work addresses the problem of minimizing human time and effort in robot learning for shared autonomy settings, presenting an incremental improvement over existing methods.
The paper tackles the problem of reducing human effort in robot learning from demonstration by using a contextual multi-armed bandit to select between human teleoperation and autonomous controllers based on predicted success probabilities, showing reduced human cost in simulated and real-world tasks.
We consider robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control. In this setting we address reinforcement learning, and learning from demonstration, where there is a cost associated with human time. This cost represents the human time required to teleoperate the robot, or recover the robot from failures. For each episode, the agent must choose between requesting human teleoperation, or using one of its autonomous controllers. In our approach, we learn to predict the success probability for each controller, given the initial state of an episode. This is used in a contextual multi-armed bandit algorithm to choose the controller for the episode. A controller is learnt online from demonstrations and reinforcement learning so that autonomous performance improves, and the system becomes less reliant on the teleoperator with more experience. We show that our approach to controller selection reduces the human cost to perform two simulated tasks and a single real-world task.