LG AISep 12, 2025

Kalman Bayesian Transformer

Haoming Jing, Oren Wright, José M. F. Moura, Yorie Nakahira

CMU

arXiv:2509.10695v14.1h-index: 11CDC

Originality Incremental advance

AI Analysis

This addresses the challenge of stabilizing training in latency-critical environments with shifting data distributions, though it is incremental as it builds on existing Bayesian and transformer methods.

The paper tackles the problem of sequential fine-tuning of transformers under distribution shifts and limited data by framing it as a Bayesian posterior inference problem, achieving robust and data-efficient learning as demonstrated in numerical simulations with a decision transformer.

Sequential fine-tuning of transformers is useful when new data arrive sequentially, especially with shifting distributions. Unlike batch learning, sequential learning demands that training be stabilized despite a small amount of data by balancing new information and previously learned knowledge in the pre-trained models. This challenge is further complicated when training is to be completed in latency-critical environments and learning must additionally quantify and be mediated by uncertainty. Motivated by these challenges, we propose a novel method that frames sequential fine-tuning as a posterior inference problem within a Bayesian framework. Our approach integrates closed-form moment propagation of random variables, Kalman Bayesian Neural Networks, and Taylor approximations of the moments of softmax functions. By explicitly accounting for pre-trained models as priors and adaptively balancing them against new information based on quantified uncertainty, our method achieves robust and data-efficient sequential learning. The effectiveness of our method is demonstrated through numerical simulations involving sequential adaptation of a decision transformer to tasks characterized by distribution shifts and limited memory resources.

View on arXiv PDF

Similar