SDAILGASAug 22, 2024

Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music

arXiv:2408.12658v22 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of capturing expressive melodic intricacies in Hindustani music for musicians and AI researchers, representing an incremental improvement over prior coarse symbolic models.

The paper tackled generative modeling of melodic vocal contours in Hindustani classical music by proposing GaMaDHaNi, a hierarchical model using finely quantized pitch contours as an intermediate representation, which outperformed non-hierarchical and self-supervised hierarchical models in listening tests and achieved a Pearson correlation coefficient of 0.85 for pitch contour fidelity.

Hindustani music is a performance-driven oral tradition that exhibits the rendition of rich melodic patterns. In this paper, we focus on generative modeling of singers' vocal melodies extracted from audio recordings, as the voice is musically prominent within the tradition. Prior generative work in Hindustani music models melodies as coarse discrete symbols which fails to capture the rich expressive melodic intricacies of singing. Thus, we propose to use a finely quantized pitch contour, as an intermediate representation for hierarchical audio modeling. We propose GaMaDHaNi, a modular two-level hierarchy, consisting of a generative model on pitch contours, and a pitch contour to audio synthesis model. We compare our approach to non-hierarchical audio models and hierarchical models that use a self-supervised intermediate representation, through a listening test and qualitative analysis. We also evaluate audio model's ability to faithfully represent the pitch contour input using Pearson correlation coefficient. By using pitch contours as an intermediate representation, we show that our model may be better equipped to listen and respond to musicians in a human-AI collaborative setting by highlighting two potential interaction use cases (1) primed generation, and (2) coarse pitch conditioning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes