An Investigation of Few-Shot Learning in Spoken Term Classification
This work addresses a domain-specific challenge in speech processing by enabling more flexible few-shot learning, though it is incremental as it builds on existing MAML methods.
The paper tackles the problem of few-shot learning for spoken term classification by relaxing the assumption that all classes are new, proposing a modified MAML algorithm that outperforms conventional supervised learning and original MAML on the Google Speech Commands dataset.
In this paper, we investigate the feasibility of applying few-shot learning algorithms to a speech task. We formulate a user-defined scenario of spoken term classification as a few-shot learning problem. In most few-shot learning studies, it is assumed that all the N classes are new in a N-way problem. We suggest that this assumption can be relaxed and define a N+M-way problem where N and M are the number of new classes and fixed classes respectively. We propose a modification to the Model-Agnostic Meta-Learning (MAML) algorithm to solve the problem. Experiments on the Google Speech Commands dataset show that our approach outperforms the conventional supervised learning approach and the original MAML.