LG AI ROApr 9, 2024

Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction

Jinyuan Feng, Min Chen, Zhiqiang Pu, Tenghai Qiu, Jianqiang Yi

arXiv:2404.05950v16.41 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses efficiency and interference issues in multi-task reinforcement learning for robotics, representing an incremental improvement over existing methods.

The paper tackles performance conflicts and negative interference in multi-task reinforcement learning by proposing Task-Specific Action Correction (TSAC), which decomposes policy learning into shared and correction policies with goal-oriented rewards, resulting in improved sample efficiency and action execution on Meta-World benchmarks.

Multi-task reinforcement learning (MTRL) demonstrate potential for enhancing the generalization of a robot, enabling it to perform multiple tasks concurrently. However, the performance of MTRL may still be susceptible to conflicts between tasks and negative interference. To facilitate efficient MTRL, we propose Task-Specific Action Correction (TSAC), a general and complementary approach designed for simultaneous learning of multiple tasks. TSAC decomposes policy learning into two separate policies: a shared policy (SP) and an action correction policy (ACP). To alleviate conflicts resulting from excessive focus on specific tasks' details in SP, ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective and achieve generalization across tasks. Additional rewards transform the original problem into a multi-objective MTRL problem. Furthermore, to convert the multi-objective MTRL into a single-objective formulation, TSAC assigns a virtual expected budget to the sparse rewards and employs Lagrangian method to transform a constrained single-objective optimization into an unconstrained one. Experimental evaluations conducted on Meta-World's MT10 and MT50 benchmarks demonstrate that TSAC outperforms existing state-of-the-art methods, achieving significant improvements in both sample efficiency and effective action execution.

View on arXiv PDF

Similar