Do Not Return Similarity: Face Recovery with Distance
This work highlights a critical privacy vulnerability in deployed face verification systems, posing a direct threat to user data security.
The paper tackled the problem of privacy risks from embedding leakage in ML systems, demonstrating that attackers can recover original face photos from leaked distance information with 93.65% success rate under black-box conditions.
Machine Learning (ML) already has been integrated into all kinds of systems, helping developers to solve problems with even higher accuracy than human beings. However, when integrating ML models into a system, developers may accidentally take not enough care of the outputs of ML models, mainly because of their unfamiliarity with ML and AI, resulting in severe consequences like hurting data owners' privacy. In this work, we focus on understanding the risks of abusing embeddings of ML models, an important and popular way of using ML. To show the consequence, we reveal several kinds of channels in which embeddings are accidentally leaked. As our study shows, a face verification system deployed by a government organization leaking only distance to authentic users allows an attacker to exactly recover the embedding of the verifier's pre-installed photo. Further, as we discovered, with the leaked embedding, attackers can easily recover the input photo with negligible quality losses, indicating devastating consequences to users' privacy. This is achieved with our devised GAN-like structure model, which showed 93.65% success rate on popular face embedding model under black box assumption.