Acoustics-specific Piano Velocity Estimation
This addresses the issue of inaccurate performance resynthesis for musicians and audio engineers, though it is incremental as it builds on existing AMT with tailored models.
The paper tackles the problem that existing Automatic Music Transcription (AMT) methods fail to preserve artistic content when resynthesizing piano performances due to instrument-specific mappings and acoustic adaptations, proposing an acoustics-specific AMT system that outperforms standard pipelines by modeling these adaptations.
Motivated by the state-of-art psychological research, we note that a piano performance transcribed with existing Automatic Music Transcription (AMT) methods cannot be successfully resynthesized without affecting the artistic content of the performance. This is due to 1) the different mappings between MIDI parameters used by different instruments, and 2) the fact that musicians adapt their way of playing to the surrounding acoustic environment. To face this issue, we propose a methodology to build acoustics-specific AMT systems that are able to model the adaptations that musicians apply to convey their interpretation. Specifically, we train models tailored for virtual instruments in a modular architecture that takes as input an audio recording and the relative aligned music score, and outputs the acoustics-specific velocities of each note. We test different model shapes and show that the proposed methodology generally outperforms the usual AMT pipeline which does not consider specificities of the instrument and of the acoustic environment. Interestingly, such a methodology is extensible in a straightforward way since only slight efforts are required to train models for the inference of other piano parameters, such as pedaling.