SD CL LG ASDec 17, 2018

Persian Vowel recognition with MFCC and ANN on PCVC speech dataset

Saber Malekzadeh, Mohammad Hossein Gholizadeh, Seyed Naser Razavi

arXiv:1812.06953v12.91 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speech recognition for Persian language users, but it is incremental as it applies existing methods (MFCC and ANN) to a new dataset.

The paper tackled the problem of recognizing Persian vowel phonemes from consonant-vowel combinations using MFCC and ANN on the new PCVC speech dataset, achieving an average recognition percentage for vowels as the result.

In this paper a new method for recognition of consonant-vowel phonemes combination on a new Persian speech dataset titled as PCVC (Persian Consonant-Vowel Combination) is proposed which is used to recognize Persian phonemes. In PCVC dataset, there are 20 sets of audio samples from 10 speakers which are combinations of 23 consonant and 6 vowel phonemes of Persian language. In each sample, there is a combination of one vowel and one consonant. First, the consonant phoneme is pronounced and just after it, the vowel phoneme is pronounced. Each sound sample is a frame of 2 seconds of audio. In every 2 seconds, there is an average of 0.5 second speech and the rest is silence. In this paper, the proposed method is the implementations of the MFCC (Mel Frequency Cepstrum Coefficients) on every partitioned sound sample. Then, every train sample of MFCC vector is given to a multilayer perceptron feed-forward ANN (Artificial Neural Network) for training process. At the end, the test samples are examined on ANN model for phoneme recognition. After training and testing process, the results are presented in recognition of vowels. Then, the average percent of recognition for vowel phonemes are computed.

View on arXiv PDF

Similar