MLLGApr 25, 2025

Post-Transfer Learning Statistical Inference in High-Dimensional Regression

arXiv:2504.18212v1h-index: 12Stat comput
Originality Highly original
AI Analysis

This addresses the lack of reliable statistical inference methods for researchers and practitioners using transfer learning in high-dimensional regression, offering a novel framework for rigorous feature selection testing.

The paper tackles the problem of quantifying statistical significance in feature selection for high-dimensional regression with transfer learning, introducing PTL-SI to provide valid p-values that control false positive rates at desired levels like 0.05.

Transfer learning (TL) for high-dimensional regression (HDR) is an important problem in machine learning, particularly when dealing with limited sample size in the target task. However, there currently lacks a method to quantify the statistical significance of the relationship between features and the response in TL-HDR settings. In this paper, we introduce a novel statistical inference framework for assessing the reliability of feature selection in TL-HDR, called PTL-SI (Post-TL Statistical Inference). The core contribution of PTL-SI is its ability to provide valid $p$-values to features selected in TL-HDR, thereby rigorously controlling the false positive rate (FPR) at desired significance level $α$ (e.g., 0.05). Furthermore, we enhance statistical power by incorporating a strategic divide-and-conquer approach into our framework. We demonstrate the validity and effectiveness of the proposed PTL-SI through extensive experiments on both synthetic and real-world high-dimensional datasets, confirming its theoretical properties and utility in testing the reliability of feature selection in TL scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes