AISep 6, 2025

TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms

arXiv:2509.05550v21 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses computational efficiency and reasoning capabilities for structured reasoning tasks, but it is incremental as it builds on existing encoder-decoder designs without attention.

The paper tackles structured reasoning by proposing TreeGPT, an attention-free neural architecture using pure TreeFFN encoder-decoder components, achieving 99% validation accuracy on the ARC Prize 2025 dataset with 3.16M parameters and converging in 1500 training steps.

We present TreeGPT, an attention-free neural architecture that explores the potential of pure TreeFFN encoder-decoder design for structured reasoning tasks. Unlike traditional transformer approaches that rely on attention mechanisms, TreeGPT employs bidirectional TreeFFN components that process sequences through adjacent connections in parallel, aiming to achieve computational efficiency while maintaining reasoning capabilities. Our approach centers on a TreeFFN Encoder-Decoder mechanism: $$\text{Encoder TreeFFN (L} \rightarrow \text{R)} + \text{Decoder TreeFFN (R} \leftarrow \text{L)} \rightarrow \text{Parallel Processing}$$ where the encoder processes left-to-right dependencies while the decoder handles right-to-left patterns, both using simple neighbor-to-neighbor connections. This design eliminates attention computation while maintaining sequence modeling capabilities. We evaluate our approach on the ARC Prize 2025 dataset, where TreeGPT achieves 99\% validation accuracy using 3.16M parameters. The model converges within 1500 training steps and demonstrates 100\% token-level accuracy on selected evaluation samples. Our preliminary results suggest that for certain structured reasoning tasks, specialized TreeFFN architectures may offer advantages over attention-based approaches. While these findings are encouraging, we acknowledge that further investigation across diverse tasks and datasets would be valuable to establish the broader applicability of attention-free designs.

View on arXiv PDF

Similar