SDASSep 27, 2021

Inferring Facing Direction from Voice Signals

arXiv:2109.13094v2
AI Analysis

This addresses the challenge of enabling only the intended device to respond in voice assistant systems, though it is an incremental step with room for improvement.

The paper tackles the problem of detecting which device a user is facing when giving voice commands in multi-device environments, using a new algorithm called CoDIR that estimates line-of-sight power from voice signals, achieving encouraging results across 500+ configurations, 5 rooms, and 9 users.

Consider a home or office where multiple devices are running voice assistants (e.g., TVs, lights, ovens, refrigerators, etc.). A human user turns to a particular device and gives a voice command, such as ``Alexa, can you ...''. This paper focuses on the problem of detecting which device the user was facing, and therefore, enabling only that device to respond to the command. Our core intuition emerges from the fact that human voice exhibits a directional radiation pattern, and the orientation of this pattern should influence the signal received at each device. Unfortunately, indoor multipath, unknown user location, and unknown voice signals pose as critical hurdles. Through a new algorithm that estimates the line-of-sight (LoS) power from a given signal, and combined with beamforming and triangulation, we design a functional solution called CoDIR. Results from $500+$ configurations, across $5$ rooms and $9$ different users, are encouraging. While improvements are necessary, we believe this is an important step forward in a challenging but urgent problem space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes