Visual search and recognition for robot task execution and monitoring
This work addresses the challenge of autonomous visual search and failure recovery for robots, though it appears incremental as it builds on existing methods like deep learning and classical planning.
The authors tackled the problem of enabling robots to visually search for targets and monitor task execution, proposing a framework that combines deep reinforcement learning for scene understanding and convolutional networks for object detection, which allowed the robot to complete simple tasks and recover from failures autonomously.
Visual search of relevant targets in the environment is a crucial robot skill. We propose a preliminary framework for the execution monitor of a robot task, taking care of the robot attitude to visually searching the environment for targets involved in the task. Visual search is also relevant to recover from a failure. The framework exploits deep reinforcement learning to acquire a "common sense" scene structure and it takes advantage of a deep convolutional network to detect objects and relevant relations holding between them. The framework builds on these methods to introduce a vision-based execution monitoring, which uses classical planning as a backbone for task execution. Experiments show that with the proposed vision-based execution monitor the robot can complete simple tasks and can recover from failures in autonomy.