An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

arXiv:2605.066853.91 citations

Predicted impact top 99% in SD · last 90 daysOriginality Incremental advance

AI Analysis

For musicologists and computational music analysis, it provides a validated method to quantify compositional style from audio, though the findings are largely confirmatory of known relationships.

The paper presents a pipeline that produces composer-level information-theoretic profiles from audio recordings, with certified transcription accuracy (F1=0.9791). Applied to 1,238 pieces, it reveals harmonic predictability ordering, recovers stylistic lineages, and separates neoclassical from historical composers via Zipfian fit (mean R²=0.78 vs 0.46).

We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, Jóhannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean $R^2 = 0.78$ for neoclassical versus 0.46 for historical (N $\geq$ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.

View on arXiv PDF

Similar