ASSDMay 26, 2021

Exploiting Temporal Dependencies for Cross-Modal Music Piece Identification

arXiv:2105.12536v12 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of cross-modal retrieval for musicians and archivists, but it is incremental as it builds on existing embedding methods by adding temporal awareness.

The paper tackles cross-modal music piece identification by learning a cross-modal embedding space for audio and sheet music images, and introduces two strategies to incorporate temporal dependencies: aligning sequences of embeddings and using an attention mechanism to reduce tempo variation effects, achieving significant improvement in retrieval on 24 hours of classical piano recordings.

This paper addresses the problem of cross-modal musical piece identification and retrieval: finding the appropriate recording(s) from a database given a sheet music query, and vice versa, working directly with audio and scanned sheet music images. The fundamental approach to this is to learn a cross-modal embedding space with a suitable similarity structure for audio and sheet image snippets, using a deep neural network, and identifying candidate pieces by cross-modal near neighbour search in this space. However, this method is oblivious of temporal aspects of music. In this paper, we introduce two strategies that address this shortcoming. First, we present a strategy that aligns sequences of embeddings learned from sheet music scans and audio snippets. A series of experiments on whole piece and fragment-level retrieval on 24 hours worth of classical piano recordings demonstrates significant improvement. Second, we show that the retrieval can be further improved by introducing an attention mechanism to the embedding learning model that reduces the effects of tempo variations in music. To conclude, we assess the scalability of our method and discuss potential measures to make it suitable for truly large-scale applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes