Umberto Casti

h-index3
2papers

2 Papers

LGOct 15, 2025
Context-Selective State Space Models: Feedback is All You Need

Riccardo Zattra, Giacomo Baggio, Umberto Casti et al.

Transformers, powered by the attention mechanism, are the backbone of most foundation models, yet they suffer from quadratic complexity and difficulties in dealing with long-range dependencies in the input sequence. Recent work has shown that state space models (SSMs) provide an efficient alternative, with the S6 module at the core of the Mamba architecture achieving state-of-the-art results on long-sequence benchmarks. In this paper, we introduce the COFFEE (COntext From FEEdback) model, a novel time-varying SSM that incorporates state feedback to enable context-dependent selectivity, while still allowing for parallel implementation. Whereas the selectivity mechanism of S6 only depends on the current input, COFFEE computes it from the internal state, which serves as a compact representation of the sequence history. This shift allows the model to regulate its dynamics based on accumulated context, improving its ability to capture long-range dependencies. In addition to state feedback, we employ an efficient model parametrization that removes redundancies present in S6 and leads to a more compact and trainable formulation. On the induction head task, COFFEE achieves near-perfect accuracy with two orders of magnitude fewer parameters and training sequences compared to S6. On MNIST, COFFEE largely outperforms S6 within the same architecture, reaching 97% accuracy with only 3585 parameters. These results showcase the role of state feedback as a key mechanism for building scalable and efficient sequence models.

SYMay 23, 2025
Selection Mechanisms for Sequence Modeling using Linear State Space Models

Umberto Casti, Sandro Zampieri, Fabio Pasqualetti

Recent advancements in language modeling tasks have been driven by architectures such as Transformers and, more recently, by Selective State Space Models (SSMs). In this paper, we introduce an alternative selection mechanism inspired by control theory methodologies. Specifically, we propose a novel residual generator for selection, drawing an analogy to fault detection strategies in Linear Time-Invariant (LTI) systems. Unlike Mamba, which utilizes Linear Time-Varying (LTV) systems, our approach combines multiple LTI systems, preserving their beneficial properties during training while achieving comparable selectivity. To evaluate the effectiveness of the proposed architecture, we test its performance on synthetic tasks. While these tasks are not inherently critical, they serve as benchmarks to test the selectivity properties of different cores architecture. This work highlights the potential of integrating theoretical insights with experimental advancements, offering a complementary perspective to deep learning innovations at the intersection of control theory and machine learning.