HASP: A High-Performance Adaptive Mobile Security Enhancement Against Malicious Speech Recognition
This addresses privacy leakage for users of mobile devices and public ASR systems by providing a high-performance security enhancement, though it is incremental as it builds on adversarial example techniques.
The paper tackles the security issue of malicious speech recognition by proposing HASP, a method that adds imperceptible adversarial noises to speech to increase the Word Error Rate (WER) of ASR systems, achieving an average WER of 84.55% and processing speeds 15x to 40x faster than state-of-the-art methods.
Nowadays, machine learning based Automatic Speech Recognition (ASR) technique has widely spread in smartphones, home devices, and public facilities. As convenient as this technology can be, a considerable security issue also raises -- the users' speech content might be exposed to malicious ASR monitoring and cause severe privacy leakage. In this work, we propose HASP -- a high-performance security enhancement approach to solve this security issue on mobile devices. Leveraging ASR systems' vulnerability to the adversarial examples, HASP is designed to cast human imperceptible adversarial noises to real-time speech and effectively perturb malicious ASR monitoring by increasing the Word Error Rate (WER). To enhance the practical performance on mobile devices, HASP is also optimized for effective adaptation to the human speech characteristics, environmental noises, and mobile computation scenarios. The experiments show that HASP can achieve optimal real-time security enhancement: it can lead an average WER of 84.55% for perturbing the malicious ASR monitoring, and the data processing speed is 15x to 40x faster compared to the state-of-the-art methods. Moreover, HASP can effectively perturb various ASR systems, demonstrating a strong transferability.