Transforming Musical Signals through a Genre Classifying Convolutional Neural Network
This is an incremental approach for music researchers and AI practitioners interested in audio transformation and network interpretability.
The authors tackled the problem of manipulating existing music by using a convolutional neural network trained for genre classification, resulting in transformed audio clips that reveal how the network interprets musical features.
Convolutional neural networks (CNNs) have been successfully applied on both discriminative and generative modeling for music-related tasks. For a particular task, the trained CNN contains information representing the decision making or the abstracting process. One can hope to manipulate existing music based on this 'informed' network and create music with new features corresponding to the knowledge obtained by the network. In this paper, we propose a method to utilize the stored information from a CNN trained on musical genre classification task. The network was composed of three convolutional layers, and was trained to classify five-second song clips into five different genres. After training, randomly selected clips were modified by maximizing the sum of outputs from the network layers. In addition to the potential of such CNNs to produce interesting audio transformation, more information about the network and the original music could be obtained from the analysis of the generated features since these features indicate how the network 'understands' the music.