Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model
For researchers building small language models with limited compute, this work offers a novel hybrid architecture that matches attention expressivity while improving long-context efficiency.
Nautile-370M is a 371M-parameter small language model that uses a hybrid backbone combining spectral memory (SeqCond Attention) with transformer layers, achieving efficient reasoning under strict budgets. It proves that its SCA readout mechanism can exactly retrieve any token and reproduce softmax attention outputs, establishing at least equivalent expressivity.
We present Nautile-370M, a 371-million-parameter small language model designed for efficient reasoning under strict parameter and inference budgets. Nautile-370M uses a hybrid backbone in which two SeqCond Attention (SCA) layers, a linear-time spectral sequence operator inspired by SeqCondenser, alternate with one transformer layer. This design aims to retain the long-context efficiency and state-tracking benefits of structured sequential models while preserving the expressive token-to-token routing of attention. The model was trained on a single Cloud TPU v4-64 pod slice provided through the Google TPU Research Cloud (TRC) program; the subsequent reinforcement learning stage was carried out on a single NVIDIA DGX Spark. We prove that the SCA readout mechanism can exactly retrieve any individual token from the prefix summary and can reproduce any output of softmax attention as a special case, establishing that SCA is at least as expressive as full self-attention in the continuous limit. We also describe the training data pipeline and outline a reinforcement learning stage specialized for reasoning, verification, and response quality.