V.O.I.C.E (Voice, Ownership, Identity, Control, Expression): Risk Taxonomy of Synthetic Voice Generation From Empirical Data
Provides a structured risk taxonomy for stakeholders (e.g., policymakers, researchers) to understand and mitigate privacy, security, and governance risks from unconsented voice synthesis, addressing a gap in existing threat models.
The paper develops V.O.I.C.E, a taxonomy of synthetic voice generation risks based on 569 incidents from major databases, 1067 direct reports from diverse U.S. participants, and 2,221 Reddit discussions, modeling how risks emerge and interact with contextual factors like exposure and legal protections.
As generative voice models are rapidly advancing in both capabilities and public utilization, the unconsented collection, reuse, and synthesis of voice data are introducing new classes of privacy, security and governance risk that are poorly captured by existing, largely uniform threat models. To fill the gap, we present V.O.I.C.E, a taxonomy of voice generation risk grounded in a multi-source threat modeling effort with 569 incidents from major AI incident database, FTC and Internet Crime Complaint Center (IC3); 1067 direct incident reports from U.S. based participants across diverse groups (including voice actors, internet personalities, political personnel, and general public); and 2,221 Reddit discussions. Grounded in real-world data, our taxonomy explicitly models how risk emerges, interact with contextual factors such as degree of exposure, social visibility, and the availability of legal protections for various affected groups.