ML LG MEJun 28, 2023

Transfer Learning with Random Coefficient Ridge Regression

arXiv:2306.15915v15.92 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing predictive accuracy in high-dimensional regression settings, such as genomics, by utilizing transfer learning, though it is incremental as it builds on existing ridge regression and transfer learning frameworks.

The paper tackles the problem of improving estimation and prediction in random coefficient ridge regression for high-dimensional data by leveraging transfer learning from related source models, resulting in smaller prediction errors compared to single-sample ridge regression or Lasso-based methods, as demonstrated in simulations and an application to polygenic risk scores for lipid traits.

Ridge regression with random coefficients provides an important alternative to fixed coefficients regression in high dimensional setting when the effects are expected to be small but not zeros. This paper considers estimation and prediction of random coefficient ridge regression in the setting of transfer learning, where in addition to observations from the target model, source samples from different but possibly related regression models are available. The informativeness of the source model to the target model can be quantified by the correlation between the regression coefficients. This paper proposes two estimators of regression coefficients of the target model as the weighted sum of the ridge estimates of both target and source models, where the weights can be determined by minimizing the empirical estimation risk or prediction risk. Using random matrix theory, the limiting values of the optimal weights are derived under the setting when $p/n \rightarrow γ$, where $p$ is the number of the predictors and $n$ is the sample size, which leads to an explicit expression of the estimation or prediction risks. Simulations show that these limiting risks agree very well with the empirical risks. An application to predicting the polygenic risk scores for lipid traits shows such transfer learning methods lead to smaller prediction errors than the single sample ridge regression or Lasso-based transfer learning.

View on arXiv PDF

Similar