Building Blocks for a Complex-Valued Transformer Architecture
This work addresses the need for direct complex-valued deep learning in domains such as medical imaging and signal processing, though it is incremental as it adapts existing transformer components.
The paper tackles the problem of applying deep learning to complex-valued signals like MRI or Fourier transforms by developing building blocks for a complex-valued transformer architecture, showing improved robustness to overfitting while maintaining on-par performance on the MusicNet dataset.
Most deep learning pipelines are built on real-valued operations to deal with real-valued inputs such as images, speech or music signals. However, a lot of applications naturally make use of complex-valued signals or images, such as MRI or remote sensing. Additionally the Fourier transform of signals is complex-valued and has numerous applications. We aim to make deep learning directly applicable to these complex-valued signals without using projections into $\mathbb{R}^2$. Thus we add to the recent developments of complex-valued neural networks by presenting building blocks to transfer the transformer architecture to the complex domain. We present multiple versions of a complex-valued Scaled Dot-Product Attention mechanism as well as a complex-valued layer normalization. We test on a classification and a sequence generation task on the MusicNet dataset and show improved robustness to overfitting while maintaining on-par performance when compared to the real-valued transformer architecture.