Matthew Rice

SD
3papers
164citations
Novelty48%
AI Score44

3 Papers

SYMay 22, 2018
A Distributed Version of the Hungarian Method for Multi-Robot Assignment

Smriti Chopra, Giuseppe Notarstefano, Matthew Rice et al.

In this paper, we propose a distributed version of the Hungarian Method to solve the well known assignment problem. In the context of multi-robot applications, all robots cooperatively compute a common assignment that optimizes a given global criterion (e.g. the total distance traveled) within a finite set of local computations and communications over a peer-to-peer network. As a motivating application, we consider a class of multi-robot routing problems with "spatio-temporal" constraints, i.e. spatial targets that require servicing at particular time instants. As a means of demonstrating the theory developed in this paper, the robots cooperatively find online, suboptimal routes by applying an iterative version of the proposed algorithm, in a distributed and dynamic setting. As a concrete experimental test-bed, we provide an interactive "multi-robot orchestral" framework in which a team of robots cooperatively plays a piece of music on a so-called orchestral floor.

24.3SDMay 18
SAME: A Semantically-Aligned Music Autoencoder

Julian D. Parker, Zach Evans, CJ Carr et al.

Latent representations are at the heart of the majority of modern generative models. In the audio domain they are typically produced by a neural-audio-codec autoencoder. In this work we introduce SAME (Semantically-Aligned Music autoEncoder), an autoencoder for stereo music and general audio that reaches a 4096$\times$ temporal compression ratio while maintaining reconstruction quality and downstream generative performance. We achieve this by combining a tranformer-based backbone with set of semantic regularisation approaches, phase-aware reconstruction losses and improved discriminator designs. The architecture delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives. Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.

39.1SDMay 18
Stable Audio 3

Zach Evans, Julian D. Parker, Matthew Rice et al.

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4. We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline.