Complex Transformer: A Framework for Modeling Complex-Valued Sequence
This addresses the problem of modeling complex-valued sequences like speech and audio signals for researchers and practitioners in signal processing, though it is incremental as it builds on the existing transformer framework.
The paper tackles the underutilization of complex numbers in deep learning for sequence modeling by proposing a Complex Transformer that adapts attention and encoder-decoder networks for complex-valued inputs, achieving state-of-the-art performance on the MusicNet and IQ signal datasets.
While deep learning has received a surge of interest in a variety of fields in recent years, major deep learning models barely use complex numbers. However, speech, signal and audio data are naturally complex-valued after Fourier Transform, and studies have shown a potentially richer representation of complex nets. In this paper, we propose a Complex Transformer, which incorporates the transformer model as a backbone for sequence modeling; we also develop attention and encoder-decoder network operating for complex input. The model achieves state-of-the-art performance on the MusicNet dataset and an In-phase Quadrature (IQ) signal dataset.