ASLGSDSPJun 19, 2022

Resource-Efficient Separation Transformer

CMU
arXiv:2206.09507v225 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the problem of computational inefficiency in speech separation models for researchers and practitioners, representing an incremental improvement over existing Transformer architectures.

The paper tackles the high computational cost of Transformer-based speech separation by introducing the Resource-Efficient Separation Transformer (RE-SepFormer), which uses non-overlapping blocks and compact latent summaries to achieve competitive performance on WSJ0-2Mix and WHAM! datasets while scaling better in memory and inference time.

Transformers have recently achieved state-of-the-art performance in speech separation. These models, however, are computationally demanding and require a lot of learnable parameters. This paper explores Transformer-based speech separation with a reduced computational cost. Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-based architecture that reduces the computational burden in two ways. First, it uses non-overlapping blocks in the latent space. Second, it operates on compact latent summaries calculated from each chunk. The RE-SepFormer reaches a competitive performance on the popular WSJ0-2Mix and WHAM! datasets in both causal and non-causal settings. Remarkably, it scales significantly better than the previous Transformer-based architectures in terms of memory and inference time, making it more suitable for processing long mixtures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes