SDCVLGASIVMay 25, 2019

Reconstructing faces from voices

arXiv:1905.10604v26 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of voice profiling for applications like biometrics or forensics, but it is incremental as it builds on existing GAN methods.

The paper tackles the problem of reconstructing a person's face from their voice using a GAN-based framework, achieving matching accuracies significantly better than chance.

Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone's face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as possible with the speaker, in terms of identity? To address this problem, we propose a simple but effective computational framework based on generative adversarial networks (GANs). The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set. We evaluate the performance of the network by leveraging a closely related task - cross-modal matching. The results show that our model is able to generate faces that match several biometric characteristics of the speaker, and results in matching accuracies that are much better than chance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes