ROCVJul 1, 2024

Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models

arXiv:2407.00985v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of accurate object segmentation from natural language instructions for domestic service robots, representing an incremental improvement with specific technical enhancements.

The paper tackles the problem of generating segmentation masks for target objects from open-vocabulary manipulation instructions for domestic service robots, addressing failures in conventional methods due to objects outside the camera's field of view and polygon vertex order issues. Their proposed method achieved a +16.32% improvement over a representative polygon-based method on a new dataset.

We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same polygon, which leads to erroneous mask generation. In this study, we propose a novel method that generates segmentation masks from open vocabulary instructions. We implement a novel loss function using optimal transport to prevent significant loss where the order of vertices differs but still represents the same polygon. To evaluate our approach, we constructed a new dataset based on the REVERIE dataset and Matterport3D dataset. The results demonstrated the effectiveness of the proposed method compared with existing mask generation methods. Remarkably, our best model achieved a +16.32% improvement on the dataset compared with a representative polygon-based method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes