LG QMSep 20, 2025

Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features

Kaichen Xu, Yihang Du, Mianpeng Liu, Zimu Yu, Xiaobo Sun

arXiv:2509.16629v24.12 citationsh-index: 3Has Code

Originality Highly original

AI Analysis

This addresses a limitation in transformer models for real-world data with non-sequential features, offering a novel solution that could benefit domains like healthcare or finance, though it is incremental in extending positional encoding methods.

The paper tackled the problem of positional encoding for transformers when dealing with non-sequential but causally-related features, proposing CAPE to generate causality-aware encodings via causal DAGs embedded in hyperbolic space, which enhanced transformer performance as demonstrated empirically on synthetic and real-world datasets.

Positional encoding is essential for supplementing transformer with positional information of tokens. Existing positional encoding methods demand predefined token/feature order, rendering them unsuitable for real-world data with non-sequential yet causally-related features. To address this limitation, we propose CAPE, a novel method that identifies underlying causal structure over non-sequential features as a weighted directed acyclic graph (DAG) using generalized structural equation modeling. The DAG is then embedded in hyperbolic space where its geometric structure is well-preserved using a hyperboloid model-based approach that effectively captures two important causal graph properties (causal strength & causal specificity). This step yields causality-aware positional encodings for the features, which are converted into their rotary form for integrating with transformer's self-attention mechanism. Theoretical analysis reveals that CAPE-generated rotary positional encodings possess three valuable properties for enhanced self-attention, including causal distance-induced attenuation, causal generality-induced attenuation, and robustness to positional disturbances. We evaluate CAPE over both synthetic and real-word datasets, empirically demonstrating its theoretical properties and effectiveness in enhancing transformer for data with non-sequential features. Our code is available at https://github.com/Catchxu/CAPE.

View on arXiv PDF Code

Similar