ASLGSDAug 19, 2023

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

arXiv:2308.10021v1h-index: 13
Originality Synthesis-oriented
AI Analysis

This work addresses singing technique conversion for audio synthesis applications, but it is incremental as it focuses on evaluating a specific architectural parameter in an existing method.

The study investigated how the bottleneck width of a convolutional autoencoder affects synthesis quality in a StarGAN-based singing technique conversion system, finding that wider bottlenecks improve articulation clarity but do not always increase likeness to the target technique, with whistle voice being the easiest target for conversion.

Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes