ASSDApr 19, 2021

Automatic Stroke Classification of Tabla Accompaniment in Hindustani Vocal Concert Audio

arXiv:2104.09064v1
Originality Incremental advance
AI Analysis

This work addresses the need for instrument-independent stroke classification in tabla accompaniment to support musicological studies at a corpus level, but it is incremental as it builds on existing methods with a focus on data augmentation.

The authors tackled the problem of automatically classifying tabla strokes in Hindustani vocal concert audio to enable large-scale musicological analysis, achieving results through a system that predicts four stroke categories using acoustic features and data augmentation, though specific accuracy numbers are not provided.

The tabla is a unique percussion instrument due to the combined harmonic and percussive nature of its timbre, and the contrasting harmonic frequency ranges of its two drums. This allows a tabla player to uniquely emphasize parts of the rhythmic cycle (theka) in order to mark the salient positions. An analysis of the loudness dynamics and timing deviations at various cycle positions is an important part of musicological studies on the expressivity in tabla accompaniment. To achieve this at a corpus-level, and not restrict it to the few recordings that manual annotation can afford, it is helpful to have access to an automatic tabla transcription system. Although a few systems have been built by training models on labeled tabla strokes, the achieved accuracy does not necessarily carry over to unseen instruments. In this article, we report our work towards building an instrument-independent stroke classification system for accompaniment tabla based on the more easily available tabla solo audio tracks. We present acoustic features that capture the distinctive characteristics of tabla strokes and build an automatic system to predict the label as one of a reduced, but musicologically motivated, target set of four stroke categories. To address the lack of sufficient labeled training data, we turn to common data augmentation methods and find the use of pitch-shifting based augmentation to be most promising. We then analyse the important features and highlight the problem of their instrument-dependence while motivating the use of more task-specific data augmentation strategies to improve the diversity of training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes