LGAISep 12, 2025

Kalman Bayesian Transformer

CMU
arXiv:2509.10695v1h-index: 11CDC
Originality Incremental advance
AI Analysis

This addresses the challenge of stabilizing training in latency-critical environments with shifting data distributions, though it is incremental as it builds on existing Bayesian and transformer methods.

The paper tackles the problem of sequential fine-tuning of transformers under distribution shifts and limited data by framing it as a Bayesian posterior inference problem, achieving robust and data-efficient learning as demonstrated in numerical simulations with a decision transformer.

Sequential fine-tuning of transformers is useful when new data arrive sequentially, especially with shifting distributions. Unlike batch learning, sequential learning demands that training be stabilized despite a small amount of data by balancing new information and previously learned knowledge in the pre-trained models. This challenge is further complicated when training is to be completed in latency-critical environments and learning must additionally quantify and be mediated by uncertainty. Motivated by these challenges, we propose a novel method that frames sequential fine-tuning as a posterior inference problem within a Bayesian framework. Our approach integrates closed-form moment propagation of random variables, Kalman Bayesian Neural Networks, and Taylor approximations of the moments of softmax functions. By explicitly accounting for pre-trained models as priors and adaptively balancing them against new information based on quantified uncertainty, our method achieves robust and data-efficient sequential learning. The effectiveness of our method is demonstrated through numerical simulations involving sequential adaptation of a decision transformer to tasks characterized by distribution shifts and limited memory resources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes