CoPE: A Lightweight Complex Positional Encoding
This work addresses a fundamental bottleneck in transformer architectures for NLP researchers and practitioners, offering a more efficient positional encoding method.
The paper tackles the problem of positional encoding in transformers by introducing CoPE, a lightweight complex-valued encoding that separates semantic content and positional information into real and imaginary parts. The approach achieves superior performance on the GLUE benchmark with less computational complexity compared to existing methods like RoPE, Sinusoidal, and Learned encodings.
Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across different sequence positions. We introduce CoPE (a lightweight Complex Positional Encoding), a novel architecture that leverages complex-valued encoding to encode both content and positional information. Our approach replaces traditional positional encodings with complex embeddings where the real part captures semantic content and the imaginary part encodes positional information. We introduce phase-aware attention in the first layer of the transformer model to capture position-dependent patterns, followed by standard attention layers for higher-levels. We show that CoPE doesn't exhibit long term decay and is compatible with linear attention. Experimental evaluation on the GLUE benchmark suggest that our approach achieves superior performance with less computational complexity, compared to RoPE, Sinusoidal and Learned positional encodings.