ROAICVLGFeb 8, 2022

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

arXiv:2202.03957v111 citations
Originality Synthesis-oriented
AI Analysis

This addresses a specific issue in continuous control RL for robotics, offering an incremental improvement over standard methods.

The paper tackles the problem of representing 3D rotations in reinforcement learning by proposing a Bingham Policy Parameterization (BPP) as an alternative to Gaussian policies, showing it improves rotation prediction in tasks like the Wahba problem and robot manipulation.

We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D pose output. Our proposed Bingham Policy Parameterization (BPP) models the Bingham distribution and allows for better rotation (quaternion) prediction over a Gaussian policy parameterization in a range of reinforcement learning tasks. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench. We hope that this paper encourages more research into developing other policy parameterization that are more suited for particular environments, rather than always assuming Gaussian.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes