ROAILGAug 12, 2024

Body Transformer: Leveraging Robot Embodiment for Policy Learning

arXiv:2408.06316v136 citationsh-index: 19Has Code
Originality Incremental advance
AI Analysis

This work addresses robot learning challenges by enhancing policy representation, though it is incremental as it builds on existing transformer architectures with domain-specific modifications.

The paper tackles the problem of vanilla transformers not fully exploiting robot embodiment in policy learning by proposing Body Transformer (BoT), which uses a graph representation and masked attention to incorporate inductive bias, resulting in improved task completion, scaling, and computational efficiency over baseline methods.

In recent years, the transformer architecture has become the de facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. Therefore, we propose Body Transformer (BoT), an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies. Additional material including the open-source code is available at https://sferrazza.cc/bot_site.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes