Sample Efficient Grasp Learning Using Equivariant Models
This work addresses sample efficiency in robotic grasp learning, enabling faster and more practical training on physical robots, though it is incremental as it applies an existing equivariant method to a specific domain.
The paper tackled the problem of planar grasp detection by recognizing that the optimal grasp function is SE(2)-equivariant and modeling it with an equivariant convolutional neural network, resulting in significantly improved sample efficiency with only 600 grasp attempts needed to learn a good approximation, enabling physical robot learning in about 1.5 hours.
In planar grasp detection, the goal is to learn a function from an image of a scene onto a set of feasible grasp poses in $\mathrm{SE}(2)$. In this paper, we recognize that the optimal grasp function is $\mathrm{SE}(2)$-equivariant and can be modeled using an equivariant convolutional neural network. As a result, we are able to significantly improve the sample efficiency of grasp learning, obtaining a good approximation of the grasp function after only 600 grasp attempts. This is few enough that we can learn to grasp completely on a physical robot in about 1.5 hours.