CLSDASOct 29, 2022

End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator

arXiv:2210.16554v28 citationsh-index: 64
AI Analysis

This addresses the challenge of handling rare words in spoken language understanding for applications like voice assistants, though it is incremental as it builds on existing biasing techniques.

The paper tackled the long-tail word problem in end-to-end spoken language understanding by using contextual biasing with a tree-constrained pointer generator and slot probability biasing, achieving over 50% SLU-F1 score in zero-shot learning on unseen entities and improving intent classification accuracy.

End-to-end spoken language understanding (SLU) suffers from the long-tail word problem. This paper exploits contextual biasing, a technique to improve the speech recognition of rare words, in end-to-end SLU systems. Specifically, a tree-constrained pointer generator (TCPGen), a powerful and efficient biasing model component, is studied, which leverages a slot shortlist with corresponding entities to extract biasing lists. Meanwhile, to bias the SLU model output slot distribution, a slot probability biasing (SPB) mechanism is proposed to calculate a slot distribution from TCPGen. Experiments on the SLURP dataset showed consistent SLU-F1 improvements using TCPGen and SPB, especially on unseen entities. On a new split by holding out 5 slot types for the test, TCPGen with SPB achieved zero-shot learning with an SLU-F1 score over 50% compared to baselines which can not deal with it. In addition to slot filling, the intent classification accuracy was also improved.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes