Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition
This work addresses the need for non-binary voice gender prediction tools to support transgender individuals and voice therapists during voice transition, though it is incremental in improving existing methods.
The paper tackled the problem of evaluating transgender voice transition by developing a system that predicts a continuous Voice Femininity Percentage (VFP) from voice recordings, achieving higher accuracy than models based on fundamental frequency or vocal tract length. It used a corpus of 41 French speakers and perceptual evaluations from 57 participants to calibrate binary gender classification models.
This paper presents a software allowing to describe voices using a continuous Voice Femininity Percentage (VFP). This system is intended for transgender speakers during their voice transition and for voice therapists supporting them in this process. A corpus of 41 French cis- and transgender speakers was recorded. A perceptual evaluation allowed 57 participants to estimate the VFP for each voice. Binary gender classification models were trained on external gender-balanced data and used on overlapping windows to obtain average gender prediction estimates, which were calibrated to predict VFP and obtained higher accuracy than $F_0$ or vocal track length-based models. Training data speaking style and DNN architecture were shown to impact VFP estimation. Accuracy of the models was affected by speakers' age. This highlights the importance of style, age, and the conception of gender as binary or not, to build adequate statistical representations of cultural concepts.