CVMMROSDASMay 17, 2021

The Boombox: Visual Reconstruction from Acoustic Vibrations

arXiv:2105.08052v213 citations
AI Analysis

This addresses a fundamental robotics task of interacting with bins and containers, offering a novel solution for scenarios where visual modalities are inadequate.

The paper tackles the problem of state estimation for objects inside containers in robotics, where visual sensors are often limited by occlusions and poor lighting, by introducing The Boombox, which uses acoustic vibrations from collisions to reconstruct visual scenes with a convolutional network, achieving state estimation from low-cost audio sensors.

Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical. While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and its containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics. Our project website is at: boombox.cs.columbia.edu

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes