Nicolas Jonason

h-index4

4papers

9citations

Novelty26%

AI Score19

Ranked #197,481 of 205,806 authors (top 96%)#1,749 in SD (top 95%)

4 Papers

SDNov 21, 2022

TimbreCLIP: Connecting Timbre to Text and Images

Nicolas Jonason, Bob L. T. Sturm

We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single instrument notes. We evaluate the models with a cross-modal retrieval task on synth patches. Finally, we demonstrate the application of TimbreCLIP on two tasks: text-driven audio equalization and timbre to image generation.

SDDec 5, 2022

Audio Latent Space Cartography

Nicolas Jonason, Bob L. T. Sturm

We explore the generation of visualisations of audio latent spaces using an audio-to-image generation pipeline. We believe this can help with the interpretability of audio latent spaces. We demonstrate a variety of results on the NSynth dataset. A web demo is available.

SDMay 21, 2024

SYMPLEX: Controllable Symbolic Music Generation using Simplex Diffusion with Vocabulary Priors

Nicolas Jonason, Luca Casini, Bob L. T. Sturm

We present a new approach for fast and controllable generation of symbolic music based on the simplex diffusion, which is essentially a diffusion process operating on probabilities rather than the signal space. This objective has been applied in domains such as natural language processing but here we apply it to generating 4-bar multi-instrument music loops using an orderless representation. We show that our model can be steered with vocabulary priors, which affords a considerable level control over the music generation process, for instance, infilling in time and pitch and choice of instrumentation -- all without task-specific model adaptation or applying extrinsic control.

SDMay 5, 2023

Exploring Softly Masked Language Modelling for Controllable Symbolic Music Generation

Nicolas Jonason, Bob L. T. Sturm

This document presents some early explorations of applying Softly Masked Language Modelling (SMLM) to symbolic music generation. SMLM can be seen as a generalisation of masked language modelling (MLM), where instead of each element of the input set being either known or unknown, each element can be known, unknown or partly known. We demonstrate some results of applying SMLM to constrained symbolic music generation using a transformer encoder architecture. Several audio examples are available at https://erl-j.github.io/smlm-web-supplement/