Robust Automatic Differentiation of Square-Root Kalman Filters via Gramian Differentials
This work addresses a specific numerical stability issue in gradient-based learning for state-space models, making it an incremental improvement for practitioners in fields like control systems and signal processing.
The paper tackled the problem of undefined or divergent gradients when differentiating through the QR decomposition in square-root Kalman filters, which hinders gradient-based parameter learning in state-space models. They resolved this by deriving a closed-form chain-rule based on the Gramian differential, proving it exact for key filter outputs and extending it to rank-deficient inputs, enabling robust automatic differentiation.
Square-root Kalman filters propagate state covariances in Cholesky-factor form for numerical stability, and are a natural target for gradient-based parameter learning in state-space models. Their core operation, triangularization of a matrix $M \in \mathbb{R}^{n \times m}$, is computed via a QR decomposition in practice, but naively differentiating through it causes two problems: the semi-orthogonal factor is non-unique when $m > n$, yielding undefined gradients; and the standard Jacobian formula involves inverses, which diverges when $M$ is rank-deficient. Both are resolved by the observation that all filter outputs relevant to learning depend on the input matrix only through the Gramian $MM^\top$, so the composite loss is smooth in $M$ even where the triangularization is not. We derive a closed-form chain-rule directly from the differential of this Gramian identity, prove it exact for the Kalman log-marginal likelihood and filtered moments, and extend it to rank-deficient inputs via a two-component decomposition: a column-space term based on the Moore--Penrose pseudoinverse, and a null-space correction for perturbations outside the column space of $M$.