LGITSep 3, 2021

A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms

arXiv:2109.01377v223 citations
AI Analysis

This work addresses negative transfer issues in machine learning for practitioners using transfer learning, though it is incremental as it builds on existing Bayesian and information-theoretic approaches.

The paper tackles the problem of negative transfer in transfer learning by developing a Bayesian framework and deriving information-theoretic bounds to characterize conditions for negative transfer, with examples showing accurate bounds even for small sample sizes. It also devises two practical online transfer learning algorithms, demonstrating effectiveness on real datasets when source and target data are similar.

Transfer learning is a machine learning paradigm where knowledge from one problem is utilized to solve a new but related problem. While conceivable that knowledge from one task could be useful for solving a related task, if not executed properly, transfer learning algorithms can impair the learning performance instead of improving it -- commonly known as negative transfer. In this paper, we study transfer learning from a Bayesian perspective, where a parametric statistical model is used. Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning. For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance using information-theoretic quantities, which allow simple and explicit characterizations when the sample size becomes large. Furthermore, examples show that the derived bounds are accurate even for small sample sizes. The obtained bounds give valuable insights into the effect of prior knowledge for transfer learning, at least with respect to our Bayesian formulation of the transfer learning problem. In particular, we formally characterize the conditions under which negative transfer occurs. Lastly, we devise two (online) transfer learning algorithms that are amenable to practical implementations, one of which does not require the parametric assumption. We demonstrate the effectiveness of our algorithms with real data sets, focusing primarily on when the source and target data have strong similarities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes