ASCRSDSPMay 29

Acoustic Simulation Framework for Multi-channel Replay Speech Detection

arXiv:2509.147896.51 citationsh-index: 7
Predicted impact top 29% in AS · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of limited multi-channel replay speech datasets for improving the robustness of voice-controlled systems against replay attacks, particularly for smart environments.

This paper introduces an acoustic simulation framework to generate multi-channel replay speech data using publicly available resources. They train a state-of-the-art multi-channel replay detector, M-ALRAD, using only synthetic data and evaluate its generalization on a real-recording corpus, ReMASC, without any real training data. The authors also extend M-ALRAD with inter-channel phase difference features to improve spatial information exploitation.

Replay speech attacks pose a significant threat to voice-controlled systems, especially in smart environments where voice assistants are widely deployed. While multi-channel audio offers spatial cues that can enhance replay detection robustness, existing datasets and methods predominantly rely on single-channel recordings. Moreover, previous studies highlighted that generalization of this attack to new environments is challenging, requiring new methods for generating data encompassing various acoustic conditions. Hence, in this work we introduce an acoustic simulation framework designed to simulate multi-channel replay speech configurations using publicly available resources. Using the framework, we train the state-of-the-art multi-channel replay detector M-ALRAD and evaluate its generalisation on the ReMASC real-recording corpus without any real training data. To improve the exploitation of spatial information, we extend M-ALRAD with inter-channel phase difference features computed for adjacent microphone pairs, augmenting the beamformed representation with directional cues. Synthetic datasets will be available upon acceptance of the paper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes