A Deep Learning Based Analysis-Synthesis Framework For Unison Singing
This work addresses the challenge of creating realistic unison singing effects for audio processing applications, though it appears incremental as it builds on existing deep-learning methods.
The paper tackled the problem of analyzing and synthesizing unison singing by using deep learning to analyze fundamental frequency distributions in choir recordings and proposing a system to synthesize a unison signal from an a cappella input and a voice prototype, with evaluation through subjective listening tests for quality, melody adherence, and perceived unison.
Unison singing is the name given to an ensemble of singers simultaneously singing the same melody and lyrics. While each individual singer in a unison sings the same principle melody, there are slight timing and pitch deviations between the singers, which, along with the ensemble of timbres, give the listener a perceived sense of "unison". In this paper, we present a study of unison singing in the context of choirs; utilising some recently proposed deep-learning based methodologies, we analyse the fundamental frequency (F0) distribution of the individual singers in recordings of unison mixtures. Based on the analysis, we propose a system for synthesising a unison signal from an a cappella input and a single voice prototype representative of a unison mixture. We use subjective listening tests to evaluate perceptual factors of our proposed system for synthesis, including quality, adherence to the melody as well the degree of perceived unison.