SD AI ASMay 3, 2024

Toward end-to-end interpretable convolutional neural networks for waveform signals

Linh Vu, Thu Tran, Wern-Han Lim, Raphael Phan

arXiv:2405.01815v12.71 citationsh-index: 1

Originality Incremental advance

AI Analysis

This provides a portable solution for researchers and practitioners in audio processing who need interpretable models, though it appears incremental as it builds on existing CNN methods for waveform data.

The paper tackles the problem of building efficient and interpretable deep learning models for raw waveform signals, such as audio, by introducing a novel CNN framework that outperforms Mel spectrogram features by up to 7% on speech emotion recognition datasets.

This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while remaining lightweight. Furthermore, we demonstrate the efficiency and interpretability of the front-end layer using the PhysioNet Heart Sound Database, illustrating its ability to handle and capture intricate long waveform patterns. Our contributions offer a portable solution for building efficient and interpretable models for raw waveform data.

View on arXiv PDF

Similar