SDLGASOct 25, 2017

End-to-End Optimized Speech Coding with Deep Neural Networks

arXiv:1710.09064v374 citations
Originality Highly original
AI Analysis

This provides a faster, automated alternative to hand-designed speech coding standards for audio applications.

The paper tackled speech compression by developing an end-to-end deep neural network model that optimizes the entire coding pipeline from raw data, achieving performance on par with the AMR-WB standard at bitrates from 9kbps to 24kbps.

Modern compression algorithms are often the result of laborious domain-specific research; industry standards such as MP3, JPEG, and AMR-WB took years to develop and were largely hand-designed. We present a deep neural network model which optimizes all the steps of a wideband speech coding pipeline (compression, quantization, entropy coding, and decompression) end-to-end directly from raw speech data -- no manual feature engineering necessary, and it trains in hours. In testing, our DNN-based coder performs on par with the AMR-WB standard at a variety of bitrates (~9kbps up to ~24kbps). It also runs in realtime on a 3.8GhZ Intel CPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes