LGAICVQUANT-PHNov 18, 2023

Deep Tensor Network

arXiv:2311.11091v33.81 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses a fundamental scalability problem for foundation models requiring unbounded context lengths, representing a new paradigm rather than an incremental improvement.

The paper tackles the quadratic complexity bottleneck of Transformer attention by introducing Deep Tensor Network, a new architectural framework that reformulates attention using tensor algebra to capture higher-order dependencies while enabling O(d²) per-token updates, rivaling State Space Models in efficiency.

The quadratic complexity of dot-product attention introduced in Transformer remains a fundamental bottleneck impeding the progress of foundation models toward unbounded context lengths. Addressing this challenge, we introduce the Deep Tensor Network, a new architectural framework that fundamentally reformulates attention by unifying the expressive power of tensor algebra with neural network design. Our approach moves beyond both conventional dot-product attention and subsequent linear-time approximations to capture higher-order statistical dependencies. We introduce two core operators derived from this framework: \emph{Tensor Attention}, which models complex token-mixing via data-dependent polynomial kernels, and Tensor Interaction, a novel mechanism for adaptive channel-mixing. We demonstrate that these operators are powered by second-order summaries that entirely bypass the formation of $n \times n$ matrices, enabling a causality-preserving streaming implementation with $O(d^2)$ per-token updates and $O(d^2)$ state. This efficiency rivals that of modern State Space Models while retaining an attention-like formulation. The Deep Tensor Network thus provides a principled and powerful new class of building blocks for next-generation sequence models, bridging the gap between scalable computation and rich, expressive interaction modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes