LiveBand: Live Accompaniment Generation in the Audio Domain
This work addresses the problem of real-time accompaniment generation for live music performance, offering a practical solution with improved quality and no lookahead.
LiveBand generates real-time, high-fidelity music accompaniments to live audio input using a causal transformer in a continuous latent space, improving over prior work on audio quality, beat alignment, and mix adherence while enabling streaming on consumer hardware.
We present LiveBand, a real-time system that generates high-fidelity music accompaniments to live audio input, respecting strict causal constraints. Our method trains a causal transformer generator in the continuous latent space of a pre-trained causal audio autoencoder, using adversarial sequence-level supervision from a discriminator. At each timestep, the generator receives only the causally available mix context and Gaussian noise, and predicts accompaniment latents without access to future mix frames or ground-truth target latents. Training is performed in a single parallel forward pass under causal masking, while streaming inference proceeds autoregressively with a rolling attention state. The model's training and inference computations are matched by design, eliminating teacher forcing and the associated exposure bias. On a multi-instrument music accompaniment benchmark, LiveBand improves over prior work on objective measures of audio quality, beat alignment, and mix adherence, while enabling real-time streaming generation without lookahead into the future on consumer hardware.