ROAILGSep 19, 2019

Robot Sound Interpretation: Combining Sight and Sound in Learning-Based Control

arXiv:1909.09172v214 citations
AI Analysis

This work addresses the challenge of integrating sound and vision for robot control, offering a novel approach that could enhance human-robot interaction, though it appears incremental in combining existing techniques.

The paper tackles the problem of enabling robots to interpret sound commands for visual-based decision making by proposing an end-to-end deep neural network trained with reinforcement learning and auxiliary losses. It demonstrates effectiveness on two robots, achieving generalization to sound types and tasks, and successfully transfers a policy from simulation to a real-world TurtleBot3.

We explore the interpretation of sound for robot decision making, inspired by human speech comprehension. While previous methods separate sound processing unit and robot controller, we propose an end-to-end deep neural network which directly interprets sound commands for visual-based decision making. The network is trained using reinforcement learning with auxiliary losses on the sight and sound networks. We demonstrate our approach on two robots, a TurtleBot3 and a Kuka-IIWA arm, which hear a command word, identify the associated target object, and perform precise control to reach the target. For both robots, we show the effectiveness of our network in generalization to sound types and robotic tasks empirically. We successfully transfer the policy learned in simulator to a real-world TurtleBot3.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes