Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-robot Handovers
This work provides a strong solution for robots to perceive container filling mass during human-robot handovers, which is crucial for safe and effective human-robot collaboration.
This paper addresses the problem of estimating the filling mass of a container during human-robot handovers using multi-modal sensor data. Their proposed method, which combines predictions of filling type, filling level, and container capacity, achieved the Top-1 overall performance in the CORSMAL 2020 Challenge on both public and private datasets.
Human-robot object handover is a key skill for the future of human-robot collaboration. CORSMAL 2020 Challenge focuses on the perception part of this problem: the robot needs to estimate the filling mass of a container held by a human. Although there are powerful methods in image processing and audio processing individually, answering such a problem requires processing data from multiple sensors together. The appearance of the container, the sound of the filling, and the depth data provide essential information. We propose a multi-modal method to predict three key indicators of the filling mass: filling type, filling level, and container capacity. These indicators are then combined to estimate the filling mass of a container. Our method obtained Top-1 overall performance among all submissions to CORSMAL 2020 Challenge on both public and private subsets while showing no evidence of overfitting. Our source code is publicly available: https://github.com/v-iashin/CORSMAL