CL LGFeb 25, 2023

Locale Encoding For Scalable Multilingual Keyword Spotting Models

Pai Zhu, Hyun Jin Park, Alex Park, Angelo Scorza Scarpati, Ignacio Lopez Moreno

arXiv:2302.12961v11.77 citationsh-index: 21

Originality Incremental advance

AI Analysis

This addresses the high costs and inefficiencies in multilingual keyword spotting systems, offering a scalable solution for voice-activated devices, though it is incremental in method.

The paper tackled the problem of scaling keyword spotting to multiple languages by proposing locale-conditioned universal models, which improved accuracy across 10 locales and reduced false rejection rates by 61% relative to monolingual models.

A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over multiple locales. Conventional monolingual KWSapproaches do not scale well to multilingual scenarios because ofhigh development/maintenance costs and lack of resource sharing.To overcome this limit, we propose two locale-conditioned universalmodels with locale feature concatenation and feature-wise linearmodulation (FiLM). We compare these models with two baselinemethods: locale-specific monolingual KWS, and a single universalmodel trained over all data. Experiments over 10 localized languagedatasets show that locale-conditioned models substantially improveaccuracy over baseline methods across all locales in different noiseconditions.FiLMperformed the best, improving on average FRRby 61% (relative) compared to monolingual KWS models of similarsizes.

View on arXiv PDF

Similar