A Comprehensive Corpus of Biomechanically Constrained Piano Chords: Generation, Analysis, and Implications for Voicing and Psychoacoustics
This work addresses the need for a comprehensive dataset for piano chord research, enabling future studies in generative modeling and psychoacoustics, though it is incremental in building upon existing chord analysis methods.
The authors tackled the problem of generating and analyzing a large corpus of playable piano chords under biomechanical constraints, resulting in approximately 19.3 million entries and showing that voicing skewness is about 5.8 times more effective than spread at predicting dissonance.
I present the generation and analysis of the largest known open-source corpus of playable piano chords (approximately 19.3 million entries). This dataset enumerates the two-handed search space subject to biomechanical constraints (two hands, each with 1.5 octave reach) to an unprecedented extent. To demonstrate the corpus's utility, the relationship between voicing shape and psychoacoustic targets was modeled. Harmonicity proved intrinsic to pitch-class identity: voicing statistics added negligible variance ($ÎR^2 \approx 0.014\%$, $p \approx 0.13$). Conversely, voicing significantly predicted dissonance ($ÎR^2 \approx 6.75\%$, $p \approx 0.0008$). Crucially, skewness ($β\approx +0.145$) was approximately 5.8$\times$ more effective than spread ($β\approx -0.025$) at predicting roughness. The analysis challenges the pedagogical emphasis on ``spread'': skewness is a stronger predictor of dissonance than spread. This suggests that clarity in ``open voicings'' is driven less by width than by negative skewness; achieving lower-register clearance by placing wide gaps at the bottom and allowing tighter clustering in the treble. The results demonstrate the corpus's ability to enable future research, especially in areas such as generative modeling, voice-leading topology, and psychoacoustic analysis.