Elastic Spectral State Space Models for Budgeted Inference
This addresses the need for flexible model deployment without retraining, though it is incremental as it builds on existing state space models and spectral filtering techniques.
The paper tackles the problem of deploying foundation models across platforms with varying resource constraints by proposing Elastic Spectral State Space Models (ES-SSM), which allow a single model trained at full capacity to be truncated at runtime for budgeted inference, achieving competitive performance compared to baselines like Transformers and SSMs across multiple benchmarks.
Foundation models are typically trained at a fixed computational capacity, while real-world applications require deployment across platforms with different resource constraints. Current approaches usually rely on training families of model variants or model distillation, which requires additional training and supports only a pre-selected set of sizes rather than fine-grained adaptation at runtime. In this paper, we propose Elastic Spectral State Space Models (ES-SSM), which require only one-time training at full capacity, but can be directly truncated into arbitrary scales for budgeted, runtime inference without retraining. Our ES-SSM builds on Hankel spectral filtering over a state space model (SSM), coupled with a lightweight input-adaptive gate trained under randomized spectral budgets. Using a shared masked normalization rule over the ordered spectral channels, we encourage predictive capability to concentrate in low-index components, while higher-index components act primarily as refinement. We test our algorithm across long-sequence benchmarks spanning text, logic, retrieval, vision, and audio. We demonstrate that a single ES-SSM model trained once can be truncated to provide competitive performance compared with modern Transformer and SSM baselines at similar parameter scales. Furthermore, by testing under various runtime budgets, we observe smooth and stable budget-performance curves over a wide range of truncation levels.