Emotion Filtering at the Edge
This addresses privacy concerns for users of voice-controlled IoT devices by preventing cloud services from profiling emotional states, though it is an incremental application of existing methods to a new domain.
The paper tackles privacy risks in cloud-based speech analysis by developing an edge-based emotion filtering layer that sanitizes voice inputs using CycleGAN-based conversion, reducing emotional state identification by ~91% with only a ~0.16% performance drop in speech recognition.
Voice controlled devices and services have become very popular in the consumer IoT. Cloud-based speech analysis services extract information from voice inputs using speech recognition techniques. Services providers can thus build very accurate profiles of users' demographic categories, personal preferences, emotional states, etc., and may therefore significantly compromise their privacy. To address this problem, we have developed a privacy-preserving intermediate layer between users and cloud services to sanitize voice input directly at edge devices. We use CycleGAN-based speech conversion to remove sensitive information from raw voice input signals before regenerating neutralized signals for forwarding. We implement and evaluate our emotion filtering approach using a relatively cheap Raspberry Pi 4, and show that performance accuracy is not compromised at the edge. In fact, signals generated at the edge differ only slightly (~0.16%) from cloud-based approaches for speech recognition. Experimental evaluation of generated signals show that identification of the emotional state of a speaker can be reduced by ~91%.