Overview of the Amphion Toolkit (v0.2)
This toolkit lowers the entry barrier for junior researchers and engineers in audio generation, though it appears incremental as a new release of an existing tool.
The Amphion Toolkit v0.2 is an open-source framework for audio, music, and speech generation, designed to make these fields more accessible by providing a versatile system with a 100K-hour multilingual dataset, robust data pipeline, and novel models for tasks like text-to-speech and voice conversion.
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation tasks and models. In this report, we introduce Amphion v0.2, the second major release developed in 2024. This release features a 100K-hour open-source multilingual dataset, a robust data preparation pipeline, and novel models for tasks such as text-to-speech, audio coding, and voice conversion. Furthermore, the report includes multiple tutorials that guide users through the functionalities and usage of the newly released models.