Visionary: Vision architecture discovery for robot learning
This addresses the challenge of optimizing vision-based control for robots, offering a novel automated approach that is incremental in applying architecture search to real-robot tasks.
The paper tackles the problem of designing vision architectures for robot manipulation by proposing an algorithm that discovers interactions between low-dimensional actions and high-dimensional visual inputs, resulting in improved task success rates and a 6% grasping performance gain on a real robot.
We propose a vision-based architecture search algorithm for robot manipulation learning, which discovers interactions between low dimension action inputs and high dimensional visual inputs. Our approach automatically designs architectures while training on the task - discovering novel ways of combining and attending image feature representations with actions as well as features from previous layers. The obtained new architectures demonstrate better task success rates, in some cases with a large margin, compared to a recent high performing baseline. Our real robot experiments also confirm that it improves grasping performance by 6%. This is the first approach to demonstrate a successful neural architecture search and attention connectivity search for a real-robot task.