Boundary Regression for Leitmotif Detection in Music Audio
This addresses a challenging task in music information retrieval for applications like music analysis and retrieval, but it appears incremental as it adapts an existing visual object detection method to a specific audio domain.
The paper tackled the problem of detecting leitmotifs in music audio by framing it as a boundary regression task instead of frame-level prediction, aiming to preserve musical integrity and produce more useful predictions.
Leitmotifs are musical phrases that are reprised in various forms throughout a piece. Due to diverse variations and instrumentation, detecting the occurrence of leitmotifs from audio recordings is a highly challenging task. Leitmotif detection may be handled as a subcategory of audio event detection, where leitmotif activity is predicted at the frame level. However, as leitmotifs embody distinct, coherent musical structures, a more holistic approach akin to bounding box regression in visual object detection can be helpful. This method captures the entirety of a motif rather than fragmenting it into individual frames, thereby preserving its musical integrity and producing more useful predictions. We present our experimental results on tackling leitmotif detection as a boundary regression task.