Nearly Zero-Shot Learning for Semantic Decoding in Spoken Dialogue Systems
This work addresses data scarcity in semantic decoding for spoken dialogue systems, but it is incremental as it builds on existing methods with specific assumptions.
The paper tackles the problem of scarce data for semantic decoding in spoken dialogue systems by proposing a deep learning architecture with joint weight optimization and an unsupervised tuning method, achieving improved F-Measure on the DSTC3 corpus.
This paper presents two ways of dealing with scarce data in semantic decoding using N-Best speech recognition hypotheses. First, we learn features by using a deep learning architecture in which the weights for the unknown and known categories are jointly optimised. Second, an unsupervised method is used for further tuning the weights. Sharing weights injects prior knowledge to unknown categories. The unsupervised tuning (i.e. the risk minimisation) improves the F-Measure when recognising nearly zero-shot data on the DSTC3 corpus. This unsupervised method can be applied subject to two assumptions: the rank of the class marginal is assumed to be known and the class-conditional scores of the classifier are assumed to follow a Gaussian distribution.