CL SD ASOct 29, 2022

End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator

Guangzhi Sun, Chao Zhang, Philip C. Woodland

arXiv:2210.16554v21.68 citationsh-index: 64Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of handling rare words in spoken language understanding for applications like voice assistants, though it is incremental as it builds on existing biasing techniques.

The paper tackled the long-tail word problem in end-to-end spoken language understanding by using contextual biasing with a tree-constrained pointer generator and slot probability biasing, achieving over 50% SLU-F1 score in zero-shot learning on unseen entities and improving intent classification accuracy.

End-to-end spoken language understanding (SLU) suffers from the long-tail word problem. This paper exploits contextual biasing, a technique to improve the speech recognition of rare words, in end-to-end SLU systems. Specifically, a tree-constrained pointer generator (TCPGen), a powerful and efficient biasing model component, is studied, which leverages a slot shortlist with corresponding entities to extract biasing lists. Meanwhile, to bias the SLU model output slot distribution, a slot probability biasing (SPB) mechanism is proposed to calculate a slot distribution from TCPGen. Experiments on the SLURP dataset showed consistent SLU-F1 improvements using TCPGen and SPB, especially on unseen entities. On a new split by holding out 5 slot types for the test, TCPGen with SPB achieved zero-shot learning with an SLU-F1 score over 50% compared to baselines which can not deal with it. In addition to slot filling, the intent classification accuracy was also improved.

View on arXiv PDF Code

Similar