ML LGSep 5, 2021

Robust Importance Sampling for Error Estimation in the Context of Optimal Bayesian Transfer Learning

Omar Maddouri, Xiaoning Qian, Francis J. Alexander, Edward R. Dougherty, Byung-Jun Yoon

arXiv:2109.02150v13.65 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of performance assessment in transfer learning for classification, especially in domains like scientific or clinical settings with limited data, though it is incremental in applying Bayesian methods to error estimation.

The paper tackles the problem of accurate classification error estimation in small-sample settings by introducing a novel Bayesian minimum mean-square error estimator for optimal Bayesian transfer learning, which outperforms standard error estimators, particularly with synthetic and RNA-seq data.

Classification has been a major task for building intelligent systems as it enables decision-making under uncertainty. Classifier design aims at building models from training data for representing feature-label distributions--either explicitly or implicitly. In many scientific or clinical settings, training data are typically limited, which makes designing accurate classifiers and evaluating their classification error extremely challenging. While transfer learning (TL) can alleviate this issue by incorporating data from relevant source domains to improve learning in a different target domain, it has received little attention for performance assessment, notably in error estimation. In this paper, we fill this gap by investigating knowledge transferability in the context of classification error estimation within a Bayesian paradigm. We introduce a novel class of Bayesian minimum mean-square error (MMSE) estimators for optimal Bayesian transfer learning (OBTL), which enables rigorous evaluation of classification error under uncertainty in a small-sample setting. Using Monte Carlo importance sampling, we employ the proposed estimator to evaluate the classification accuracy of a broad family of classifiers that span diverse learning capabilities. Experimental results based on both synthetic data as well as real-world RNA sequencing (RNA-seq) data show that our proposed OBTL error estimation scheme clearly outperforms standard error estimators, especially in a small-sample setting, by tapping into the data from other relevant domains.

View on arXiv PDF

Similar