Quality assessment of voice converted speech using articulatory features
This work addresses the need for objective quality assessment in voice conversion systems, which is incremental as it applies an existing acoustic-to-articulatory inversion method to a new domain.
The paper tackled the problem of assessing voice converted speech quality by quantifying the loss of articulatory information, showing increased RMSE error and decreased mutual information for both male and female voices, and demonstrating that articulatory features correlate better with human opinion scores than traditional measures.
We propose a novel application based on acoustic-to-articulatory inversion towards quality assessment of voice converted speech. The ability of humans to speak effortlessly requires coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards naturalness, intelligibility and speakers identity which is partially present in voice converted speech. Hence, during voice conversion, the information related to speech production is lost. In this paper, this loss is quantified for male voice, by showing increase in RMSE error for voice converted speech followed by showing decrease in mutual information. Similar results are obtained in case of female voice. This observation is extended by showing that articulatory features can be used as an objective measure. The effectiveness of proposed measure over MCD is illustrated by comparing their correlation with Mean Opinion Score.