NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics
This addresses the challenge of enabling efficient and biologically plausible language models for AI research, though it is incremental as it builds on existing SNN and state-space methods.
The paper tackled the problem of training a pure spiking neural network (SNN) for large-scale language modeling without Transformer distillation, and the result was that NeuronSpark-0.9B achieved a pretraining loss of 3.6 and showed early multi-turn dialogue behavior after supervised fine-tuning.
We ask whether a pure spiking backbone can learn large-scale language modeling from random initialization, without Transformer distillation. We introduce NeuronSpark, a 0.9B-parameter SNN language model trained with next-token prediction and surrogate gradients. The model combines selective state-space spiking dynamics, leakage-current inter-layer communication, PonderNet adaptive timesteps, fused Triton PLIF kernels, and stabilization techniques (residual centering, lateral-inhibition normalization, and natural-gradient compensation). Under a constrained budget (about 1.4B pretraining tokens and 6.5K SFT steps), NeuronSpark-0.9B reaches 3.6 pretraining loss and shows early multi-turn dialogue behavior after SFT. These results support the feasibility of end-to-end language modeling with a pure SNN architecture at this scale.