PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices
This work addresses resource-constrained ASR systems for edge devices, improving equity and fairness across diverse speaker groups, but it appears incremental as it builds on existing ASR technologies.
The paper tackles the challenges of adaptivity, incrementality, and inclusivity in edge-based automatic speech recognition (ASR) for diverse populations by proposing PI-Whisper, which achieves state-of-the-art accuracy with a 13.7% relative reduction in word error rate (WER) compared to baselines.
Edge-based automatic speech recognition (ASR) technologies are increasingly prevalent in the development of intelligent and personalized assistants. However, resource-constrained ASR models face significant challenges in adaptivity, incrementality, and inclusivity when faced with a diverse population. To tackle those challenges, we propose PI-Whisper, a novel ASR system that adaptively enhances recognition capabilities by identifying speakers' characteristics in real-time. In this work, we show how the design of PI-Whisper allows for incremental adaptation of new characteristics without the need for repetitive retraining, enhances recognition capabilities, and improves equity and fairness across diverse speaker groups. PI-Whisper demonstrates these advantages by achieving state-of-the-art accuracy, reducing the word error rate (WER) by up to 13.7% relative to baselines while scaling linearly to computing resources.