SDMMASDec 24, 2021

Enabling Real-time On-chip Audio Super Resolution for Bone Conduction Microphones

arXiv:2112.13156v121 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of enabling high-fidelity audio in low-power hearable devices for users in noisy environments, representing an incremental improvement by optimizing existing super resolution techniques for embedded deployment.

The paper tackled the problem of limited bandwidth in bone conduction microphones (BCM) for voice communication by developing a real-time on-chip audio super resolution system, achieving higher speech quality than baseline methods and being perceived as significantly better by both expert and amateur listeners in user studies.

Voice communication using the air conduction microphone in noisy environments suffers from the degradation of speech audibility. Bone conduction microphones (BCM) are robust against ambient noises but suffer from limited effective bandwidth due to their sensing mechanism. Although existing audio super resolution algorithms can recover the high frequency loss to achieve high-fidelity audio, they require considerably more computational resources than available in low-power hearable devices. This paper proposes the first-ever real-time on-chip speech audio super resolution system for BCM. To accomplish this, we built and compared a series of lightweight audio super resolution deep learning models. Among all these models, ATS-UNet is the most cost-efficient because the proposed novel Audio Temporal Shift Module (ATSM) reduces the network's dimensionality while maintaining sufficient temporal features from speech audios. Then we quantized and deployed the ATS-UNet to low-end ARM micro-controller units for real-time embedded prototypes. Evaluation results show that our system achieved real-time inference speed on Cortex-M7 and higher quality than the baseline audio super resolution method. Finally, we conducted a user study with ten experts and ten amateur listeners to evaluate our method's effectiveness to human ears. Both groups perceived a significantly higher speech quality with our method when compared to the solutions with the original BCM or air conduction microphone with cutting-edge noise reduction algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes