Score-Agnostic Structure Analysis in Large-Scale Performance Datasets
For researchers in music performance analysis, this work addresses the problem of structural variability in automatically transcribed piano datasets, enabling meaningful comparative studies without requiring ground-truth scores.
The authors propose a score-agnostic method using sequence-to-sequence alignment and hierarchical clustering to group automatically transcribed piano performances by structural interpretation, enabling valid comparisons in large-scale datasets. Applied to 1,500 transcriptions of 88 compositions, the method clusters performances based on alignment cost and sequence length dissimilarity.
In recent years, thanks to advances in automatic music transcription (AMT), several large-scale datasets of automatically transcribed piano solo music have been released. While these datasets undoubtedly offer extensive material for performance studies, they vary substantially in quality. In the case of classical music, performances often differ not only in expressive aspects such as tempo, but also in their structural interpretation of the score (including repeat patterns and edition-specific variants). To meaningfully use large-scale transcribed datasets for performance research, transcriptions of the same piece must be grouped according to their underlying structural realisation to support valid comparison. We address this by applying sequence-to-sequence alignment followed by hierarchical clustering: we create pairwise alignments for all pairs of transcriptions of a given piece, and use the alignment cost and (dis)similarity of performed sequence lengths to resolve structural mismatches as features for grouping. We propose this approach as a first step towards automatically evaluating large-scale transcribed datasets that lack ground-truth score and/or audio, shifting the evaluation criterion from truth-based accuracy to musical coherence and plausibility. We demonstrate our score-agnostic approach on around 1,500 transcriptions of 88 compositions from a recently published large-scale transcribed piano performance dataset.