Locale Encoding For Scalable Multilingual Keyword Spotting Models
This addresses the high costs and inefficiencies in multilingual keyword spotting systems, offering a scalable solution for voice-activated devices, though it is incremental in method.
The paper tackled the problem of scaling keyword spotting to multiple languages by proposing locale-conditioned universal models, which improved accuracy across 10 locales and reduced false rejection rates by 61% relative to monolingual models.
A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over multiple locales. Conventional monolingual KWSapproaches do not scale well to multilingual scenarios because ofhigh development/maintenance costs and lack of resource sharing.To overcome this limit, we propose two locale-conditioned universalmodels with locale feature concatenation and feature-wise linearmodulation (FiLM). We compare these models with two baselinemethods: locale-specific monolingual KWS, and a single universalmodel trained over all data. Experiments over 10 localized languagedatasets show that locale-conditioned models substantially improveaccuracy over baseline methods across all locales in different noiseconditions.FiLMperformed the best, improving on average FRRby 61% (relative) compared to monolingual KWS models of similarsizes.