Multi-task Reinforcement Learning with a Planning Quasi-Metric
This addresses efficiency in multi-task reinforcement learning for robotics and control tasks, but appears incremental as it builds on existing methods with a novel decomposition.
The paper tackles the problem of multi-task reinforcement learning by introducing a planning quasi-metric combined with task-specific aimers, achieving multiple-fold training speed-up on the bit-flip problem and MuJoCo robotic arm simulator.
We introduce a new reinforcement learning approach combining a planning quasi-metric (PQM) that estimates the number of steps required to go from any state to another, with task-specific "aimers" that compute a target state to reach a given goal. This decomposition allows the sharing across tasks of a task-agnostic model of the quasi-metric that captures the environment's dynamics and can be learned in a dense and unsupervised manner. We achieve multiple-fold training speed-up compared to recently published methods on the standard bit-flip problem and in the MuJoCo robotic arm simulator.