ASSDJan 11, 2018

Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization

arXiv:1801.03740v312 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of monaural sound localization, which could benefit applications like hearing aids or robotics where multiple microphones are impractical, though it is incremental in building on known human auditory capabilities and existing NMF methods.

The paper tackles the problem of sound source localization using only a single microphone by leveraging scattering structures, such as LEGO bricks, to create direction-dependent signatures, and achieves accurate localization of arbitrary speakers without needing speaker-specific training. It demonstrates this with experimental results using non-negative matrix factorization to regularize the ill-posed inverse problem for speech.

Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes