CL AI LGJun 21, 2025

TPTT: Transforming Pretrained Transformers into Titans

arXiv:2506.17671v21 citationsh-index: 1Has Code

Originality Incremental advance

AI Analysis

This addresses efficiency challenges for deploying LLMs in resource-limited environments, though it appears incremental as it builds on existing linear attention and fine-tuning techniques.

The paper tackled the problem of quadratic computational and memory requirements in Transformer-based LLMs, which hinder efficient inference on long contexts, by introducing TPTT, a framework that augments pretrained models with linearized attention and memory gating, showing potential improvements such as a 20% relative increase in Exact Match scores for a 1B-parameter model on MMLU.

Transformer-based large language models (LLMs) have achieved strong performance across many natural language processing tasks. Nonetheless, their quadratic computational and memory requirements, particularly in self-attention layers, pose challenges for efficient inference on long contexts and for deployment in resource-limited environments. We present TPTT (Transforming Pretrained Transformers into Titans), a framework designed to augment pretrained Transformers with linearized attention (LiZA) and internal memory gating via Memory as Gate (MaG), applied without full retraining. TPTT supports parameter-efficient fine-tuning (LoRA) and integrates with standard toolkits such as Hugging Face Transformers. We evaluated TPTT on several pretrained models, including Llama-1B, OlMoE-1B-7B, Qwen2.5-1.5B, Gemma3-270m, OpenELM-1.3B, and Mistral-7B, in order to assess applicability across architectures of different scales. Experiments on models with approximately 1 billion parameters, evaluated primarily on the MMLU benchmark, suggest potential improvements in both efficiency and accuracy compared to baseline models. For example, Titans-Llama-1B exhibited up to a 20\% relative increase in Exact Match scores in one-shot evaluation. An additional finding is that it is possible to convert a quadratic-attention model into a purely linear-attention model using the DeltaProduct mechanism. All training runs were carried out with modest computational resources. These preliminary findings indicate that TPTT may help adapt pretrained LLMs for long-context tasks with limited overhead. Further studies on larger models and a broader set of benchmarks will be necessary to evaluate the generality and robustness of the framework. Code is available at https://github.com/fabienfrfr/tptt . Python package at https://pypi.org/project/tptt/ .

View on arXiv PDF Code

Similar