PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text
This provides a valuable resource for Music Information Retrieval (MIR) research, addressing a gap in piano music datasets, though it is incremental as it builds on existing data collection and annotation methods.
The authors tackled the lack of datasets for piano solo music with text labels by creating PIAST, a multimodal dataset with 9,673 tracks from YouTube and human annotations for 2,023 tracks, including audio, text, and MIDI data, and reported baseline performances for music tagging and retrieval.
While piano music has become a significant area of study in Music Information Retrieval (MIR), there is a notable lack of datasets for piano solo music with text labels. To address this gap, we present PIAST (PIano dataset with Audio, Symbolic, and Text), a piano music dataset. Utilizing a piano-specific taxonomy of semantic tags, we collected 9,673 tracks from YouTube and added human annotations for 2,023 tracks by music experts, resulting in two subsets: PIAST-YT and PIAST-AT. Both include audio, text, tag annotations, and transcribed MIDI utilizing state-of-the-art piano transcription and beat tracking models. Among many possible tasks with the multi-modal dataset, we conduct music tagging and retrieval using both audio and MIDI data and report baseline performances to demonstrate its potential as a valuable resource for MIR research.