Synthesizer Preset Interpolation using Transformer Auto-Encoders
This work addresses the need for easier sound design in music production, though it is incremental as it builds on existing interpolation techniques.
The paper tackles the problem of enabling intuitive sound creation by interpolating between synthesizer presets, and demonstrates that their bimodal auto-encoder model performs smoother interpolations compared to related methods.
Sound synthesizers are widespread in modern music production but they increasingly require expert skills to be mastered. This work focuses on interpolation between presets, i.e., sets of values of all sound synthesis parameters, to enable the intuitive creation of new sounds from existing ones. We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions. This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters. Experiments have compared the model to related architectures and methods, and have demonstrated that it performs smoother interpolations. After training, the proposed model can be integrated into commercial synthesizers for live interpolation or sound design tasks.