SDLGASDec 25, 2018

Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement

arXiv:1812.10095v12 citations
Originality Incremental advance
AI Analysis

This work addresses efficient speech enhancement for low-end devices like mobile phones, though it is incremental as it builds on existing LSTM methods with parameter reduction.

The paper tackled the problem of high parameter count and computational resource requirements in LSTM networks for monaural speech enhancement by proposing a Tensor-Train factorized LSTM (TT-LSTM) model, which achieved competitive performance with state-of-the-art uncompressed RNN models while being orders of magnitude less complex.

In recent years, Long Short-Term Memory (LSTM) has become a popular choice for speech separation and speech enhancement task. The capability of LSTM network can be enhanced by widening and adding more layers. However, this would introduce millions of parameters in the network and also increase the requirement of computational resources. These limitations hinders the efficient implementation of RNN models in low-end devices such as mobile phones and embedded systems with limited memory. To overcome these issues, we proposed to use an efficient alternative approach of reducing parameters by representing the weight matrix parameters of LSTM based on Tensor-Train (TT) format. We called this Tensor-Train factorized LSTM as TT-LSTM model. Based on this TT-LSTM units, we proposed a deep TensorNet model for single-channel speech enhancement task. Experimental results in various test conditions and in terms of standard speech quality and intelligibility metrics, demonstrated that the proposed deep TT-LSTM based speech enhancement framework can achieve competitive performances with the state-of-the-art uncompressed RNN model, even though the proposed model architecture is orders of magnitude less complex.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes