VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation
This work addresses the limitation of existing transcription models in capturing instrument-specific nuances like violin playing techniques, which is important for musicians and music researchers, though it is incremental as it builds on prior transcription methods.
The authors tackled the problem of automatic music transcription for violin by developing VioPTT, a model that transcribes pitch, timing, and playing technique, achieving state-of-the-art performance and strong generalization to real-world recordings. They also released a synthetic dataset, MOSA-VPT, to address the lack of labeled data.
While automatic music transcription is well-established in music information retrieval, most models are limited to transcribing pitch and timing information from audio, and thus omit crucial expressive and instrument-specific nuances. One example is playing technique on the violin, which affords its distinct palette of timbres for maximal emotional impact. Here, we propose VioPTT (Violin Playing Technique-aware Transcription), a lightweight, end-to-end model that directly transcribes violin playing technique in addition to pitch onset and offset. Furthermore, we release MOSA-VPT, a novel, high-quality synthetic violin playing technique dataset to circumvent the need for manually labeled annotations. Leveraging this dataset, our model demonstrated strong generalization to real-world note-level violin technique recordings in addition to achieving state-of-the-art transcription performance. To our knowledge, VioPTT is the first to jointly combine violin transcription and playing technique prediction within a unified framework.