Zero-shot Learning and Knowledge Transfer in Music Classification and Tagging
This work addresses the limitation of fixed-label supervised learning in music analysis, enabling predictions for new categories without retraining, though it appears incremental as an extension of prior research.
The authors tackled the problem of predicting unseen labels in music classification and tagging by applying zero-shot learning, which uses semantic side information to project audio and labels into a shared space, and they extended this to test generalization through knowledge transfer across different music corpora.
Music classification and tagging is conducted through categorical supervised learning with a fixed set of labels. In principle, this cannot make predictions on unseen labels. Zero-shot learning is an approach to solve the problem by using side information about the semantic labels. We recently investigated this concept of zero-shot learning in music classification and tagging task by projecting both audio and label space on a single semantic space. In this work, we extend the work to verify the generalization ability of zero-shot learning model by conducting knowledge transfer to different music corpora.