ASSDJun 10, 2020

Speaker Diarization: Using Recurrent Neural Networks

arXiv:2006.05596v1Has Code
AI Analysis

This work addresses speaker diarization for audio processing, but it is incremental as it applies existing neural network methods to a specific dataset.

The paper tackled speaker diarization by training neural networks, including RNN, CNN, MLP, and SLP, on a two-channel audio file with two speakers, achieving approximately 92% accuracy with RNN.

Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2 speakers (on separate channel). We train Neural Network for learning when a person is speaking. We use different type of Neural Networks specifically, Single Layer Perceptron (SLP), Multi Layer Perceptron (MLP), Recurrent Neural Network (RNN) and Convolution Neural Network (CNN) we achieve $\sim$92\% of accuracy with RNN. The code for this project is available at https://github.com/vishalshar/SpeakerDiarization_RNN_CNN_LSTM

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes