OtoWorld: Towards Learning to Separate by Learning to Move
This work addresses the challenge of integrating computer audition with reinforcement learning for navigation tasks, though it is incremental as it builds on existing tools like GridWorld.
The researchers tackled the problem of enabling agents to learn auditory navigation by creating OtoWorld, an interactive environment where agents must locate and deactivate sound sources using only audio input, with preliminary results showing agents can successfully complete tasks in this setup.
We present OtoWorld, an interactive environment in which agents must learn to listen in order to solve navigational tasks. The purpose of OtoWorld is to facilitate reinforcement learning research in computer audition, where agents must learn to listen to the world around them to navigate. OtoWorld is built on three open source libraries: OpenAI Gym for environment and agent interaction, PyRoomAcoustics for ray-tracing and acoustics simulation, and nussl for training deep computer audition models. OtoWorld is the audio analogue of GridWorld, a simple navigation game. OtoWorld can be easily extended to more complex environments and games. To solve one episode of OtoWorld, an agent must move towards each sounding source in the auditory scene and "turn it off". The agent receives no other input than the current sound of the room. The sources are placed randomly within the room and can vary in number. The agent receives a reward for turning off a source. We present preliminary results on the ability of agents to win at OtoWorld. OtoWorld is open-source and available.