CV LGJun 14, 2024

LieRE: Lie Rotational Positional Encodings

Sophie Ostmeier, Brian Axelrod, Maya Varma, Michael E. Moseley, Akshay Chaudhari, Curtis Langlotz

arXiv:2406.10322v515.811 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the problem of modeling spatial structure in transformers for vision tasks, offering a novel method that is incremental over RoPE.

The authors tackled the limitation of Rotary Position Encoding (RoPE) in transformers for high-dimensional data by introducing LieRE, a learnable generalization using Lie algebra and group elements, which improved performance on 2D and 3D vision tasks with better generalization to higher resolutions while maintaining efficiency.

Transformer architectures rely on position encodings to model the spatial structure of input data. Rotary Position Encoding (RoPE) is a widely used method in language models that encodes relative positions through fixed, block-diagonal, rotation matrices applied to key-query interactions. We hypothesize that this inductive bias limits their RoPE's effectiveness for modalities with high dimensional structure. Lie Relative Encodings (LieRE) introduce a principled generalization of RoPE, aimed at increasing the representational capacity of positional encodings in transformers. Instead of fixed 2D rotations, LieRE learns dense skew-symmetric matrices (Lie algebra elements), which are then differentiable mapped to form high-dimensional rotation matrices (Lie group elements). This results in richer, learnable, and continuous, encodings of both relative and absolute positional information. We demonstrate the effectiveness of LieRE on 2D and 3D vision tasks, showing that it generalizes well to higher input resolutions while maintaining computational efficiency. The code and checkpoints are publicly available at https://github.com/StanfordMIMI/LieRE.

View on arXiv PDF Code

Similar