AS LG SD SPJun 19, 2022

Resource-Efficient Separation Transformer

Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin

CMU

arXiv:2206.09507v210.825 citationsh-index: 31Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of computational inefficiency in speech separation models for researchers and practitioners, representing an incremental improvement over existing Transformer architectures.

The paper tackles the high computational cost of Transformer-based speech separation by introducing the Resource-Efficient Separation Transformer (RE-SepFormer), which uses non-overlapping blocks and compact latent summaries to achieve competitive performance on WSJ0-2Mix and WHAM! datasets while scaling better in memory and inference time.

Transformers have recently achieved state-of-the-art performance in speech separation. These models, however, are computationally demanding and require a lot of learnable parameters. This paper explores Transformer-based speech separation with a reduced computational cost. Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-based architecture that reduces the computational burden in two ways. First, it uses non-overlapping blocks in the latent space. Second, it operates on compact latent summaries calculated from each chunk. The RE-SepFormer reaches a competitive performance on the popular WSJ0-2Mix and WHAM! datasets in both causal and non-causal settings. Remarkably, it scales significantly better than the previous Transformer-based architectures in terms of memory and inference time, making it more suitable for processing long mixtures.

View on arXiv PDF Code

Similar