LGSDASFeb 3, 2023

SPADE: Self-supervised Pretraining for Acoustic DisEntanglement

arXiv:2302.01483v1h-index: 9
Originality Incremental advance
AI Analysis

This work addresses the challenge of device arbitration in speech processing by learning acoustic representations, but it appears incremental as it builds on prior self-supervised methods for speech disentanglement.

The paper tackles the problem of disentangling room acoustics from speech using a self-supervised pretraining approach, resulting in significant performance improvements over a baseline for device arbitration when labeled data is scarce.

Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer vision, and speech. Previous self-supervised work in the speech domain has disentangled multiple attributes of speech such as linguistic content, speaker identity, and rhythm. In this work, we introduce a self-supervised approach to disentangle room acoustics from speech and use the acoustic representation on the downstream task of device arbitration. Our results demonstrate that our proposed approach significantly improves performance over a baseline when labeled training data is scarce, indicating that our pretraining scheme learns to encode room acoustic information while remaining invariant to other attributes of the speech signal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes