SDAIASFeb 8, 2025

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

arXiv:2502.05512v138 citationsh-index: 2Has Code
Originality Highly original
AI Analysis

IndexTTS addresses the need for a controllable and efficient zero-shot text-to-speech system for the industry, particularly in Chinese scenarios, providing a solution for users seeking high-quality voice synthesis.

The IndexTTS system tackles the problem of text-to-speech synthesis, achieving significant improvements in naturalness, content consistency, and zero-shot voice cloning, with faster inference speed compared to other popular TTS systems. It surpasses the performance of systems like Fish-Speech, CosyVoice2, FireRedTTS, and F5-TTS.

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method that combines characters and pinyin, making the pronunciations of polyphonic characters and long-tail characters controllable. We also performed a comparative analysis of the Vector Quantization (VQ) with Finite-Scalar Quantization (FSQ) for codebook utilization of acoustic speech tokens. To further enhance the effect and stability of voice cloning, we introduce a conformer-based speech conditional encoder and replace the speechcode decoder with BigVGAN2. Compared with XTTS, it has achieved significant improvements in naturalness, content consistency, and zero-shot voice cloning. As for the popular TTS systems in the open-source, such as Fish-Speech, CosyVoice2, FireRedTTS and F5-TTS, IndexTTS has a relatively simple training process, more controllable usage, and faster inference speed. Moreover, its performance surpasses that of these systems. Our demos are available at https://index-tts.github.io.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes