ITAISDASSPJan 20, 2025

Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels

arXiv:2501.17879v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses a critical challenge for wireless AR/VR applications by enabling efficient speech transmission under bandwidth constraints, though it is incremental as it builds on existing autoencoder-based methods.

The paper tackles the problem of transmitting correlated high-fidelity speech from multiple devices over bandwidth-limited channels by proposing a neural distributed principal component analysis-aided algorithm that balances perceptual realism with task-specific performance. Experiments show significant PSNR improvements over naive autoencoder methods, with gains of 19% in task-agnostic and 52% in task-aware settings, approaching theoretical upper bounds in low-bandwidth scenarios.

Emerging wireless AR/VR applications require real-time transmission of correlated high-fidelity speech from multiple resource-constrained devices over unreliable, bandwidth-limited channels. Existing autoencoder-based speech source coding methods fail to address the combination of the following - (1) dynamic bitrate adaptation without retraining the model, (2) leveraging correlations among multiple speech sources, and (3) balancing downstream task loss with realism of reconstructed speech. We propose a neural distributed principal component analysis (NDPCA)-aided distributed source coding algorithm for correlated speech sources transmitting to a central receiver. Our method includes a perception-aware downstream task loss function that balances perceptual realism with task-specific performance. Experiments show significant PSNR improvements under bandwidth constraints over naive autoencoder methods in task-agnostic (19%) and task-aware settings (52%). It also approaches the theoretical upper bound, where all correlated sources are sent to a single encoder, especially in low-bandwidth scenarios. Additionally, we present a rate-distortion-perception trade-off curve, enabling adaptive decisions based on application-specific realism needs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes