Do Autonomous Agents Benefit from Hearing?
This addresses the problem of enhancing autonomous agents' sensory capabilities for researchers in reinforcement learning, though it is incremental as it extends existing multimodal approaches.
The paper tackled the problem of vision-only agents being limited by lack of audible cues in deep reinforcement learning, and found that adding audio features to visual information improved agent behavior in reach-the-goal tasks in the ViZDoom environment.
Mapping states to actions in deep reinforcement learning is mainly based on visual information. The commonly used approach for dealing with visual information is to extract pixels from images and use them as state representation for reinforcement learning agent. But, any vision only agent is handicapped by not being able to sense audible cues. Using hearing, animals are able to sense targets that are outside of their visual range. In this work, we propose the use of audio as complementary information to visual only in state representation. We assess the impact of such multi-modal setup in reach-the-goal tasks in ViZDoom environment. Results show that the agent improves its behavior when visual information is accompanied with audio features.