SD AI ASJul 19, 2024

Stable Audio Open

Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

arXiv:2407.14358v241.2226 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This provides an open baseline for artists and researchers to build upon, addressing a gap in accessible generative audio models.

The authors tackled the lack of accessible text-to-audio models by developing an open-weights model trained with Creative Commons data, achieving competitive performance with state-of-the-art models, including high-quality stereo sound synthesis at 44.1kHz as shown in FDopenl3 results.

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

View on arXiv PDF Code

Similar