VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion
This addresses a security problem for manufacturers and users of voice-controlled devices, though it is an incremental attack method.
The paper tackles the security vulnerability of deep neural network-based keyword spotting systems to backdoor attacks during third-party training, achieving an average attack success rate of nearly 97% with less than 1% poisoned data.
Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulating third-party training data. An effective backdoor attack can force the model to make specified judgments under certain conditions, i.e., triggers. In this paper, we design a backdoor attack scheme based on Voiceprint Selection and Voice Conversion, abbreviated as VSVC. Experimental results demonstrated that VSVC is feasible to achieve an average attack success rate close to 97% in four victim models when poisoning less than 1% of the training data.