SoftGPT: Learn Goal-oriented Soft Object Manipulation Skills by Generative Pre-trained Heterogeneous Graph Transformer
This work addresses the problem of teaching robots to manipulate soft objects in domestic settings, which is incremental as it builds on existing pre-trained models and skill learning techniques.
The paper tackles the challenge of robotic soft object manipulation by proposing SoftGPT, a pre-trained model that uses a heterogeneous graph representation and GPT-based dynamics to learn skills from exploration data, enabling efficient learning of various manipulation tasks with potential for direct human demonstration.
Soft object manipulation tasks in domestic scenes pose a significant challenge for existing robotic skill learning techniques due to their complex dynamics and variable shape characteristics. Since learning new manipulation skills from human demonstration is an effective way for robot applications, developing prior knowledge of the representation and dynamics of soft objects is necessary. In this regard, we propose a pre-trained soft object manipulation skill learning model, namely SoftGPT, that is trained using large amounts of exploration data, consisting of a three-dimensional heterogeneous graph representation and a GPT-based dynamics model. For each downstream task, a goal-oriented policy agent is trained to predict the subsequent actions, and SoftGPT generates the consequences of these actions. Integrating these two approaches establishes a thinking process in the robot's mind that provides rollout for facilitating policy learning. Our results demonstrate that leveraging prior knowledge through this thinking process can efficiently learn various soft object manipulation skills, with the potential for direct learning from human demonstrations.