A Convolutional-Attentional Neural Framework for Structure-Aware Performance-Score Synchronization
This addresses a domain-specific problem in music signal processing for audio-to-score alignment, offering incremental improvements over existing methods.
The paper tackles performance-score synchronization by proposing a convolutional-attentional neural framework that outperforms state-of-the-art methods across various score modalities and acoustic conditions, demonstrating robustness to structural differences.
Performance-score synchronization is an integral task in signal processing, which entails generating an accurate mapping between an audio recording of a performance and the corresponding musical score. Traditional synchronization methods compute alignment using knowledge-driven and stochastic approaches, and are typically unable to generalize well to different domains and modalities. We present a novel data-driven method for structure-aware performance-score synchronization. We propose a convolutional-attentional architecture trained with a custom loss based on time-series divergence. We conduct experiments for the audio-to-MIDI and audio-to-image alignment tasks pertained to different score modalities. We validate the effectiveness of our method via ablation studies and comparisons with state-of-the-art alignment approaches. We demonstrate that our approach outperforms previous synchronization methods for a variety of test settings across score modalities and acoustic conditions. Our method is also robust to structural differences between the performance and score sequences, which is a common limitation of standard alignment approaches.