LG RONov 11, 2024

GSL-PCD: Improving Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning

arXiv:2411.06733v1

Originality Incremental advance

AI Analysis

This work addresses efficiency and performance issues in generalist-specialist learning for robotic manipulation, representing an incremental improvement over existing methods.

The paper tackles inefficient generalization in deep reinforcement learning across environment variations by proposing GSL-PCD, which uses point cloud feature-based task partitioning to cluster similar variations for specialists, resulting in a 9.4% performance improvement and 50% reduction in computational and sample requirements on robotic manipulation tasks.

Generalization in Deep Reinforcement Learning (DRL) across unseen environment variations often requires training over a diverse set of scenarios. Many existing DRL algorithms struggle with efficiency when handling numerous variations. The Generalist-Specialist Learning (GSL) framework addresses this by first training a generalist model on all variations, then creating specialists from the generalist's weights, each focusing on a subset of variations. The generalist then refines its learning with assistance from the specialists. However, random task partitioning in GSL can impede performance by assigning vastly different variations to the same specialist, often resulting in each specialist focusing on only one variation, which raises computational costs. To improve this, we propose Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning (GSL-PCD). Our approach clusters environment variations based on features extracted from object point clouds and uses balanced clustering with a greedy algorithm to assign similar variations to the same specialist. Evaluations on robotic manipulation tasks from the ManiSkill benchmark demonstrate that point cloud feature-based partitioning outperforms vanilla partitioning by 9.4%, with a fixed number of specialists, and reduces computational and sample requirements by 50% to achieve comparable performance.

View on arXiv PDF

Similar