Exploring single-song autoencoding schemes for audio-based music structure analysis
This addresses the challenge of ambiguous and tedious annotations in music structure analysis for researchers and practitioners, though it is incremental as it adapts existing autoencoding techniques to a specific domain.
The paper tackled the problem of music structure analysis by proposing a piece-specific autoencoder that learns a latent representation for each song without supervision, achieving performance comparable to supervised state-of-the-art methods with 3 seconds tolerance on the RWC-Pop dataset.
The ability of deep neural networks to learn complex data relations and representations is established nowadays, but it generally relies on large sets of training data. This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song, which can then be used to infer the song structure. Such a model does not rely on supervision nor annotations, which are well-known to be tedious to collect and often ambiguous in Music Structure Analysis. We report that the proposed unsupervised auto-encoding scheme achieves the level of performance of supervised state-of-the-art methods with 3 seconds tolerance when using a Log Mel spectrogram representation on the RWC-Pop dataset.