SDAILGASMar 14, 2025

Designing Neural Synthesizers for Low-Latency Interaction

arXiv:2503.11562v22 citationsh-index: 17J Audio Eng Soc
Originality Incremental advance
AI Analysis

This work addresses latency issues for musicians using neural audio synthesizers, enabling more intimate musical interaction, though it is incremental as it builds on the existing RAVE model.

The paper tackled the problem of high latency in neural audio synthesis models for musical interaction by analyzing latency sources and proposing an iterative design approach, resulting in BRAVE, a low-latency model with improved pitch and loudness replication while maintaining timbre modification capabilities similar to RAVE.

Neural Audio Synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real-time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, we investigate the sources of latency and jitter typically found in interactive NAS models. We then apply this analysis to the task of timbre transfer using RAVE, a convolutional variational autoencoder for audio waveforms introduced by Caillon et al. in 2021. Finally, we present an iterative design approach for optimizing latency. This culminates with a model we call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. We implement it in a specialized inference framework for low-latency, real-time inference and present a proof-of-concept audio plugin compatible with audio signals from musical instruments. We expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes