LG MLJul 10, 2024

Towards a theory of learning dynamics in deep state space models

Jakub Smékal, Jimmy T. H. Smith, Michael Kleinman, Dan Biderman, Scott W. Linderman

arXiv:2407.07279v19.26 citationsh-index: 31

Originality Synthesis-oriented

AI Analysis

This work provides incremental theoretical insights into learning dynamics for researchers studying state space models in long sequence modeling.

The authors tackled the lack of theoretical understanding of state space models (SSMs) by studying the learning dynamics of linear SSMs to analyze how data covariance, latent state size, and initialization affect parameter evolution with gradient descent. They showed that frequency-domain analysis enables analytical solutions, linked one-dimensional SSMs to deep linear networks, and analyzed how latent state over-parameterization affects convergence time.

State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We show that focusing on the learning dynamics in the frequency domain affords analytical solutions under mild assumptions, and we establish a link between one-dimensional SSMs and the dynamics of deep linear feed-forward networks. Finally, we analyze how latent state over-parameterization affects convergence time and describe future work in extending our results to the study of deep SSMs with nonlinear connections. This work is a step toward a theory of learning dynamics in deep state space models.

View on arXiv PDF

Similar