SDLGASSep 15, 2018

Attention as a Perspective for Learning Tempo-invariant Audio Queries

arXiv:1809.05689v1
Originality Incremental advance
AI Analysis

This work addresses a specific issue in music information retrieval for classical piano, offering an incremental improvement over existing methods.

The paper tackled the problem of tempo mismatch in audio-sheet music retrieval by introducing a soft attention mechanism to focus on relevant audio parts, resulting in improved retrieval performance for classical piano music.

Current models for audio--sheet music retrieval via multimodal embedding space learning use convolutional neural networks with a fixed-size window for the input audio. Depending on the tempo of a query performance, this window captures more or less musical content, while notehead density in the score is largely tempo-independent. In this work we address this disparity with a soft attention mechanism, which allows the model to encode only those parts of an audio excerpt that are most relevant with respect to efficient query codes. Empirical results on classical piano music indicate that attention is beneficial for retrieval performance, and exhibits intuitively appealing behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes