SDAILGASMay 2, 2024

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

arXiv:2405.01242v328 citationsh-index: 14Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies
Originality Highly original
AI Analysis

This addresses the problem of impractical bone conduction speech enhancement for mobile/wearable users by enabling efficient, high-quality processing with significant battery life improvements.

The paper tackles speech enhancement for bone conduction on mobile/wearable platforms by proposing TRAMBA, a hybrid transformer-Mamba architecture that achieves up to 7.3% PESQ and 1.8% STOI improvements over state-of-the-art GANs while reducing memory footprint by an order of magnitude and increasing inference speed up to 465 times.

We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes