SDHCASDec 10, 2017

Prodorshok I: A Bengali Isolated Speech Dataset for Voice-Based Assistive Technologies - A comparative analysis of the effects of data augmentation on HMM-GMM and DNN classifiers

arXiv:1712.03579v15 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for speaker-independent ASR in Bengali to enhance human-computer interaction, but it is incremental as it applies existing methods to a new dataset.

The paper tackled the problem of creating a Bengali isolated speech dataset for voice-based assistive technologies and found that simple data augmentation with pitch shift significantly improves accuracy in speech recognition systems.

Prodorshok I is a Bengali isolated word dataset tailored to help create speaker-independent, voice-command driven automated speech recognition (ASR) based assistive technologies to help improve human-computer interaction (HCI). This paper presents the results of an objective analysis that was undertaken using a subset of words from Prodorshok I to assess its reliability in ASR systems that utilize Hidden Markov Models (HMM) with Gaussian emissions and Deep Neural Networks (DNN). The results show that simple data augmentation involving a small pitch shift can make surprisingly tangible improvements to accuracy levels in speech recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes