LGAIFeb 2

Tabula RASA: Exposing and Breaking the Relational Bottleneck in Transformers

arXiv:2602.02834v1
Originality Incremental advance
AI Analysis

This addresses a bottleneck in transformers for tasks requiring complex relational reasoning, offering a practical improvement for AI applications in domains like question answering, though it is incremental as it builds on existing transformer architectures.

The paper tackled the problem of transformers struggling with multi-hop relational reasoning over structured data by introducing RASA, a minimal modification that adds edge-type embeddings and sparse masking, resulting in improved performance on tasks like MetaQA and WebQuestionsSP, with a +7.1 point advantage on 3-hop reasoning.

Transformers achieve remarkable performance across many domains, yet struggle with tasks requiring multi-hop relational reasoning over structured data. We analyze this limitation through circuit complexity: standard transformers are $\mathsf{TC}^0$-complete and require $Ω(k)$ layers for $k$-hop reasoning. We introduce RASA (Relation-Aware Sparse Attention), a minimal modification adding: (1) edge-type embeddings that inject relational structure into attention scores, and (2) sparse masking that restricts attention to graph-adjacent positions. While RASA has the same asymptotic depth requirements, sparse masking reduces the attention search space from $O(2^{n^2})$ to $O(2^m)$ patterns, and edge biases provide explicit relation routing. Empirically, on MetaQA (1/2/3-hop) and WebQuestionsSP, RASA outperforms standard transformers and matches GPT-4 at lower cost, with advantages growing with reasoning depth (+7.1 points on 3-hop). We do not claim formal learnability guarantees; the contribution is empirical validation that minimal structural modifications substantially improve multi-hop reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes