Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models
This work addresses the need for efficient parallel-in-time sequence models in machine learning, offering a unifying framework that improves computational speed without sacrificing expressivity.
The authors tackled the problem of designing efficient yet maximally expressive sequence models by introducing Structured Linear Controlled Differential Equations (SLiCEs), which achieve state-of-the-art length generalization on regular language tasks and match performance on six multivariate time-series classification datasets while reducing training time by a factor of twenty.
This work introduces Structured Linear Controlled Differential Equations (SLiCEs), a unifying framework for sequence models with structured, input-dependent state-transition matrices that retain the maximal expressivity of dense matrices whilst being cheaper to compute. The framework encompasses existing architectures, such as input-dependent block-diagonal linear recurrent neural networks and DeltaNet's diagonal-plus-low-rank structure, as well as two novel variants based on sparsity and the Walsh-Hadamard transform. We prove that, unlike the diagonal state-transition matrices of S4D and Mamba, SLiCEs employing block-diagonal, sparse, or Walsh-Hadamard matrices match the maximal expressivity of dense matrices. Empirically, SLiCEs solve the $A_5$ state-tracking benchmark with a single layer, achieve best-in-class length generalisation on regular language tasks among parallel-in-time models, and match the performance of log neural controlled differential equations on six multivariate time-series classification datasets while cutting the average time per training step by a factor of twenty.