SDASApr 19

ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning

arXiv:2604.1465491.4h-index: 14
AI Analysis

For bandwidth-constrained communication (e.g., satellite, underwater), ClariCodec improves intelligibility at extreme compression levels where traditional codecs fail.

ClariCodec achieves 3.20% WER on LibriSpeech test-clean at 200 bps, a 13% relative reduction over its baseline, by using reinforcement learning to optimize intelligibility in ultra-low-bitrate speech coding.

In bandwidth-constrained communication such as satellite and underwater channels, speech must often be transmitted at ultra-low bitrates where intelligibility is the primary objective. At such extreme compression levels, codecs trained with acoustic reconstruction losses tend to allocate bits to perceptual detail, leading to substantial degradation in word error rate (WER). This paper proposes ClariCodec, a neural speech codec operating at 200 bit per second (bps) that reformulates quantisation as a stochastic policy, enabling reinforcement learning (RL)-based optimisation of intelligibility. Specifically, the encoder is fine-tuned using WER-driven rewards while the acoustic reconstruction pipeline remains frozen. Even without RL, ClariCodec achieves 3.68% WER on the LibriSpeech test-clean set at 200 bps, already competitive with codecs operating at higher bitrates. Further RL fine-tuning reduces WER to 3.20% on test-clean and 8.93% on test-other, corresponding to a 13% relative reduction while preserving perceptual quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes