CLJun 11, 2024

Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices

arXiv:2406.07504v14.25 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of voice cloning models failing to preserve queer vocal styles, affecting accessibility for LGBTQ+ communities, but it is incremental as it builds on existing pipelines to raise ethical concerns.

The study tested a voice cloning pipeline's ability to capture 'gay voice' and found it homogenized speech, making synthesized versions sound significantly 'less gay' for speakers with that style while increasing ratings for control speakers, which impacts accessibility and similarity. It also highlighted ethical risks, such as safety and fairness issues in data collection and potential harms from misuse.

Modern voice cloning models claim to be able to capture a diverse range of voices. We test the ability of a typical pipeline to capture the style known colloquially as "gay voice" and notice a homogenisation effect: synthesised speech is rated as sounding significantly "less gay" (by LGBTQ+ participants) than its corresponding ground-truth for speakers with "gay voice", but ratings actually increase for control speakers. Loss of "gay voice" has implications for accessibility. We also find that for speakers with "gay voice", loss of "gay voice" corresponds to lower similarity ratings. However, we caution that improving the ability of such models to synthesise ``gay voice'' comes with a great number of risks. We use this pipeline as a starting point for a discussion on the ethics of modelling queer voices more broadly. Collecting "clean" queer data has safety and fairness ramifications, and the resulting technology may cause harms from mockery to death.

View on arXiv PDF

Similar