LGMLDec 12, 2023

Can a Transformer Represent a Kalman Filter?

arXiv:2312.06937v329 citationsh-index: 1L4DC
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for using Transformers in state estimation and control, potentially impacting robotics and related fields, though it is incremental as it builds on existing Transformer and Kalman Filter concepts.

The paper tackles the problem of representing a Kalman Filter using Transformers in linear dynamical systems, showing that a causally-masked Transformer can approximate the Kalman Filter with a small, time-uniform error and also approximate optimal control policies like LQG.

Transformers are a class of autoregressive deep learning architectures which have recently achieved state-of-the-art performance in various vision, language, and robotics tasks. We revisit the problem of Kalman Filtering in linear dynamical systems and show that Transformers can approximate the Kalman Filter in a strong sense. Specifically, for any observable LTI system we construct an explicit causally-masked Transformer which implements the Kalman Filter, up to a small additive error which is bounded uniformly in time; we call our construction the Transformer Filter. Our construction is based on a two-step reduction. We first show that a softmax self-attention block can exactly represent a Nadaraya-Watson kernel smoothing estimator with a Gaussian kernel. We then show that this estimator closely approximates the Kalman Filter. We also investigate how the Transformer Filter can be used for measurement-feedback control and prove that the resulting nonlinear controllers closely approximate the performance of standard optimal control policies such as the LQG controller.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes