SD LG ASJan 24, 2022

End-to-End Neural Speech Coding for Real-Time Communications

Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu

arXiv:2201.09429v39.438 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient speech coding in real-time communications, though it appears incremental by combining existing tasks into a single network.

The paper tackled the problem of low-latency speech coding for real-time communications by proposing TFNet, an end-to-end neural codec that integrates speech enhancement and packet loss concealment, achieving improved subjective and objective results.

Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC). This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that has seldom been investigated in audio coding. An interleaved structure is proposed for temporal filtering to capture both short-term and long-term temporal dependencies. Furthermore, with end-to-end optimization, the TFNet is jointly optimized with speech enhancement and packet loss concealment, yielding a one-for-all network for three tasks. Both subjective and objective results demonstrate the efficiency of the proposed TFNet.

View on arXiv PDF

Similar