ASAug 19, 2023
Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique ConversionTung-Cheng Su, Yung-Chuan Chang, Yi-Wen Liu
Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle.
SPMay 30, 2019
Transient-evoked otoacoustic emission signals predicting outcomes of acute sensorineural hearing loss in patients with Meniere's DiseaseYi-Wen Liu, Sheng-Lun Kao, Hau-Tieng Wu et al.
Background: Fluctuating hearing loss is characteristic of Meniere's Disease (MD) during acute episodes. However, no reliable audiometric hallmarks are available for counselling the hearing recovery possibility. Aims/Objectives: To find parameters for predicting MD hearing outcomes. Material and Methods: We applied machine learning techniques to analyse transient-evoked otoacoustic emission (TEOAE) signals recorded from patients with MD. Thirty unilateral MD patients were recruited prospectively after onset of acute cochleo-vestibular symptoms. Serial TEOAE and pure-tone audiogram (PTA) data were recorded longitudinally. Denoised TEOAE signals were projected onto the three most prominent principal directions through a linear transformation. Binary classification was performed using a support vector machine (SVM). TEOAE signal parameters, including signal energy and group delay, were compared between improved and nonimproved groups using Welchs t-test. Results: Signal energy did not differ (p = 0.64) but a significant difference in 1-kHz (p = 0.045) group delay was recorded between improved and nonimproved groups. The SVM achieved a cross-validated accuracy of >80% in predicting hearing outcomes. Conclusions and Significance: This study revealed that baseline TEOAE parameters obtained during acute MD episodes, when processed through machine learning technology, may provide information on outer hair cell function to predict hearing recovery.