CVROMar 6, 2018

Hybrid Multi-camera Visual Servoing to Moving Target

arXiv:1803.02285v223 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a specific problem in robotics for tasks requiring precise manipulation under dynamic and occluded conditions, representing an incremental improvement over existing methods.

The paper tackles the challenge of accurately guiding a robot arm to dynamically moving targets in the presence of partial occlusions by proposing a hybrid multi-camera visual servoing approach, which achieves good performance in experiments across four different situations including tracking a ball and delivering an item to a moving hand.

Visual servoing is a well-known task in robotics. However, there are still challenges when multiple visual sources are combined to accurately guide the robot or occlusions appear. In this paper we present a novel visual servoing approach using hybrid multi-camera input data to lead a robot arm accurately to dynamically moving target points in the presence of partial occlusions. The approach uses four RGBD sensors as Eye-to-Hand (EtoH) visual input, and an arm-mounted stereo camera as Eye-in-Hand (EinH). A Master supervisor task selects between using the EtoH or the EinH, depending on the distance between the robot and target. The Master also selects the subset of EtoH cameras that best perceive the target. When the EinH sensor is used, if the target becomes occluded or goes out of the sensor's view-frustum, the Master switches back to the EtoH sensors to re-track the object. Using this adaptive visual input data, the robot is then controlled using an iterative planner that uses position, orientation and joint configuration to estimate the trajectory. Since the target is dynamic, this trajectory is updated every time-step. Experiments show good performance in four different situations: tracking a ball, targeting a bulls-eye, guiding a straw to a mouth and delivering an item to a moving hand. The experiments cover both simple situations such as a ball that is mostly visible from all cameras, and more complex situations such as the mouth which is partially occluded from some of the sensors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes