A Switching System Theory of Q-Learning with Linear Function Approximation
For researchers in reinforcement learning, this work offers a new theoretical perspective on convergence of linear Q-learning, though it remains a theoretical analysis without empirical validation.
This paper provides a switching-system interpretation of Q-learning with linear function approximation, deriving exact linear switched models for mean dynamics and relating convergence to joint spectral radius stability. The framework yields less conservative stability certificates and connects projected Bellman equations, stochastic-policy switching, and switched-system stability.
This paper develops a switching-system interpretation of Q-learning with linear function approximation (LFA) based on the joint spectral radius (JSR). We derive an exact linear switched model for the mean dynamics and relate convergence to stability of the corresponding switched system. The same construction is then used for stochastic linear Q-learning with independent and identically distributed (i.i.d.) observations and with Markovian observations. Although exact JSR computation is difficult in general, the certificate captures products of switching modes and can be less conservative than one-step norm bounds. The framework also yields a JSR-based view of regularized Q-learning with LFA. The resulting analysis connects projected Bellman equations, finite-difference stochastic-policy switching, and switched-system stability in a single parameter-space formulation.