AS LG SD SPJun 4, 2025

BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto

arXiv:2506.03515v12.32 citationsh-index: 16INTERSPEECH

Originality Incremental advance

AI Analysis

This work addresses the need for compact TTS models for on-device applications, representing an incremental improvement in model compression techniques.

The paper tackles the problem of reducing model size for on-device text-to-speech by introducing 1.58-bit quantization and weight indexing, achieving an 83% reduction in model size while outperforming a baseline of similar size in synthesis quality.

This paper proposes a highly compact, lightweight text-to-speech (TTS) model for on-device applications. To reduce the model size, the proposed model introduces two techniques. First, we introduce quantization-aware training (QAT), which quantizes model parameters during training to as low as 1.58-bit. In this case, most of 32-bit model parameters are quantized to ternary values {-1, 0, 1}. Second, we propose a method named weight indexing. In this method, we save a group of 1.58-bit weights as a single int8 index. This allows for efficient storage of model parameters, even on hardware that treats values in units of 8-bit. Experimental results demonstrate that the proposed method achieved 83 % reduction in model size, while outperforming the baseline of similar model size without quantization in synthesis quality.

View on arXiv PDF

Similar