CLAINov 11, 2024

TreeCoders: Trees of Transformers

arXiv:2411.07218v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses efficiency bottlenecks in transformer models for natural language processing, though it appears incremental as it builds on existing transformer architectures.

The paper tackles the problem of improving transformer efficiency by introducing TreeCoders, a family of transformer trees that replace linear transformers with k-ary trees, achieving competitive results on language datasets and outperforming size-equivalent linear transformers 76% of the time.

In this paper, we introduce TreeCoders, a novel family of transformer trees. We moved away from traditional linear transformers to complete k-ary trees. Transformer blocks serve as nodes, and generic classifiers learn to select the best child and route the sequence of tokens to a specific leaf. The selectors, moved outside the transformer blocks, allow for the use of a variety of architecture without further modifications. Furthermore, our proposed architecture supports sparse node activation due to the logarithmic complexity of a tree search. We validate our idea by testing a series of decoder-only tree transformers, achieving competitive results across a diverse range of language datasets. Our study demonstrates that the proposed tree transformer model outperforms a size-equivalent linear transformer model 76\% of the time over a wide range of tree architectures. Furthermore, our proposed model naturally lends itself to distributed implementation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes