CVLGMar 11, 2021

Unknown Object Segmentation from Stereo Images

arXiv:2103.06796v140 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation for robots in dynamic environments by enabling instance segmentation without relying on known object categories, though it is incremental as it builds on transformer architectures and stereo sensing.

The paper tackles the problem of segmenting unknown object instances from stereo images without prior semantic or geometric information, and shows that their Instance Stereo Transformer (INSTR) algorithm outperforms current state-of-the-art depth-based methods in experiments across multiple domains.

Although instance-aware perception is a key prerequisite for many autonomous robotic applications, most of the methods only partially solve the problem by focusing solely on known object categories. However, for robots interacting in dynamic and cluttered environments, this is not realistic and severely limits the range of potential applications. Therefore, we propose a novel object instance segmentation approach that does not require any semantic or geometric information of the objects beforehand. In contrast to existing works, we do not explicitly use depth data as input, but rely on the insight that slight viewpoint changes, which for example are provided by stereo image pairs, are often sufficient to determine object boundaries and thus to segment objects. Focusing on the versatility of stereo sensors, we employ a transformer-based architecture that maps directly from the pair of input images to the object instances. This has the major advantage that instead of a noisy, and potentially incomplete depth map as an input, on which the segmentation is computed, we use the original image pair to infer the object instances and a dense depth map. In experiments in several different application domains, we show that our Instance Stereo Transformer (INSTR) algorithm outperforms current state-of-the-art methods that are based on depth maps. Training code and pretrained models will be made available.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes