Learning rich touch representations through cross-modal self-supervision
This work addresses the problem of enhancing robot manipulation skills by leveraging touch sensing, which is fundamental but underutilized, though it appears incremental as it benchmarks existing self-supervised methods on new datasets.
The paper tackled learning rich touch features from cross-modal self-supervision for robot manipulation, introducing two new datasets with a simulated robotic hand and showing that cross-modal self-supervision effectively improves touch representation in few-shot classification on unseen objects and poses.
The sense of touch is fundamental in several manipulation tasks, but rarely used in robot manipulation. In this work we tackle the problem of learning rich touch features from cross-modal self-supervision. We evaluate them identifying objects and their properties in a few-shot classification setting. Two new datasets are introduced using a simulated anthropomorphic robotic hand equipped with tactile sensors on both synthetic and daily life objects. Several self-supervised learning methods are benchmarked on these datasets, by evaluating few-shot classification on unseen objects and poses. Our experiments indicate that cross-modal self-supervision effectively improves touch representation, and in turn has great potential to enhance robot manipulation skills.