Francisco Gomez-Martin

ASAug 1, 2020

Singer Identification Using Convolutional Acoustic Motif Embeddings

Aitor Arronte Alvarez, Francisco Gomez-Martin

Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult computational task. In this article we present an end-to-end pipeline for flamenco singer identification based on acoustic motif embeddings. In the approach taken, the fundamental frequency obtained directly from the raw audio signal is approximated. This approximation reduces the high variability of the audio signal and allows for small melodic patterns to be discovered using a sequential pattern mining technique, thus creating a dictionary of motifs. Several acoustic features are then used to extract fixed length embeddings of variable length motifs by using convolutional architectures. We test the quality of the embeddings in a flamenco singer identification task, comparing our approach with previous deep learning architectures, and study the effect of motivic patterns and acoustic features in the identification task. Results indicate that motivic patterns play a crucial role in identifying flamenco singers by minimizing the size of the signal to be learned, discarding information that is not relevant in the identification task. The deep learning architecture presented outperforms denser models used in large-scale audio classification problems.

SDApr 24, 2019

An Attentional Neural Network Architecture for Folk Song Classification

Aitor Arronte-Alvarez, Francisco Gomez-Martin

In this paper we present an attentional neural network for folk song classification. We introduce the concept of musical motif embedding, and show how using melodic local context we are able to model monophonic folk song motifs using the skipgram version of the word2vec algorithm. We use the motif embeddings to represent folk songs from Germany, China, and Sweden, and classify them using an attentional neural network that is able to discern relevant motifs in a song. The results show how the network obtains state of the art accuracy in a completely unsupervised manner, and how motif embeddings produce high quality motif representations from folk songs. We conjecture on the advantages of this type of representation in large symbolic music corpora, and how it can be helpful in the musicological analysis of folk song collections from different cultures and geographical areas.

Francisco Gomez-Martin

2 Papers