SDNov 29, 2020

A comparison of handcrafted, parameterized, and learnable features for speech separation

arXiv:2011.14295v2
AI Analysis

This work provides a systematic comparison of feature types for speech separation, which is useful for researchers designing new separation networks.

This paper compares handcrafted, parameterized, and learnable acoustic features for speech separation within the Conv-Tasnet framework. They found that when the decoder is learnable, all feature types (STFT, MPGTF, ParaMPGTF, and learnable) yield similar performance. When using pseudo-inverse decoders, the proposed parameterized MPGTF outperformed other handcrafted features.

The design of acoustic features is important for speech separation. It can be roughly categorized into three classes: handcrafted, parameterized, and learnable features. Among them, learnable features, which are trained with separation networks jointly in an end-to-end fashion, become a new trend of modern speech separation research, e.g. convolutional time domain audio separation network (Conv-Tasnet), while handcrafted and parameterized features are also shown competitive in very recent studies. However, a systematic comparison across the three kinds of acoustic features has not been conducted yet. In this paper, we compare them in the framework of Conv-Tasnet by setting its encoder and decoder with different acoustic features. We also generalize the handcrafted multi-phase gammatone filterbank (MPGTF) to a new parameterized multi-phase gammatone filterbank (ParaMPGTF). Experimental results on the WSJ0-2mix corpus show that (i) if the decoder is learnable, then setting the encoder to STFT, MPGTF, ParaMPGTF, and learnable features lead to similar performance; and (ii) when the pseudo-inverse transforms of STFT, MPGTF, and ParaMPGTF are used as the decoders, the proposed ParaMPGTF performs better than the other two handcrafted features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes