LGJul 17, 2023
Revisiting the Robustness of the Minimum Error Entropy Criterion: A Transfer Learning Case StudyLuis Pedro Silvestrin, Shujian Yu, Mark Hoogendoorn
Coping with distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.
MLFeb 10, 2022
Transfer-Learning Across Datasets with Different Input Dimensions: An Algorithm and Analysis for the Linear Regression CaseLuis Pedro Silvestrin, Harry van Zanten, Mark Hoogendoorn et al.
With the development of new sensors and monitoring devices, more sources of data become available to be used as inputs for machine learning models. These can on the one hand help to improve the accuracy of a model. On the other hand, combining these new inputs with historical data remains a challenge that has not yet been studied in enough detail. In this work, we propose a transfer learning algorithm that combines new and historical data with different input dimensions. This approach is easy to implement, efficient, with computational complexity equivalent to the ordinary least-squares method, and requires no hyperparameter tuning, making it straightforward to apply when the new data is limited. Different from other approaches, we provide a rigorous theoretical study of its robustness, showing that it cannot be outperformed by a baseline that utilizes only the new data. Our approach achieves state-of-the-art performance on 9 real-life datasets, outperforming the linear DSFT, another linear transfer learning algorithm, and performing comparably to non-linear DSFT.