ML LG OTOct 19, 2025

Adaptive Sample Sharing for Linear Regression

Hamza Cherkaoui, Hélène Halconruy, Yohan Petetin

arXiv:2510.16986v14.5h-index: 11

Originality Incremental advance

AI Analysis

This work addresses data scarcity in business settings by enabling safe sample sharing, though it is incremental as it builds on existing ridge regression and transfer learning frameworks.

The paper tackles the problem of scarce labeled data in supervised learning by developing a data-driven rule for sample sharing in ridge regression, which decides how many auxiliary samples to add to improve predictive error while avoiding negative transfer. The method shows consistent gains over baselines and single-task training in synthetic and real datasets.

In many business settings, task-specific labeled data are scarce or costly to obtain, which limits supervised learning on a specific task. To address this challenge, we study sample sharing in the case of ridge regression: leveraging an auxiliary data set while explicitly protecting against negative transfer. We introduce a principled, data-driven rule that decides how many samples from an auxiliary dataset to add to the target training set. The rule is based on an estimate of the transfer gain i.e. the marginal reduction in the predictive error. Building on this estimator, we derive finite-sample guaranties: under standard conditions, the procedure borrows when it improves parameter estimation and abstains otherwise. In the Gaussian feature setting, we analyze which data set properties ensure that borrowing samples reduces the predictive error. We validate the approach in synthetic and real datasets, observing consistent gains over strong baselines and single-task training while avoiding negative transfer.

View on arXiv PDF

Similar