ROAIJul 17, 2023

Clarifying the Half Full or Half Empty Question: Multimodal Container Classification

arXiv:2307.08471v11 citationsh-index: 46
Originality Synthesis-oriented
AI Analysis

This work addresses multimodal perception for robotics, but it is incremental as it compares existing fusion strategies in a specific use case.

The paper tackled the problem of classifying containers and their contents using multimodal data (visual, tactile, proprioceptive) on a robot, finding that the best fusion strategy achieved 15% higher accuracy than using a single modality.

Multimodal integration is a key component of allowing robots to perceive the world. Multimodality comes with multiple challenges that have to be considered, such as how to integrate and fuse the data. In this paper, we compare different possibilities of fusing visual, tactile and proprioceptive data. The data is directly recorded on the NICOL robot in an experimental setup in which the robot has to classify containers and their content. Due to the different nature of the containers, the use of the modalities can wildly differ between the classes. We demonstrate the superiority of multimodal solutions in this use case and evaluate three fusion strategies that integrate the data at different time steps. We find that the accuracy of the best fusion strategy is 15% higher than the best strategy using only one singular sense.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes