MLAug 4, 2021Code
MRCpy: A Library for Minimax Risk ClassifiersKartheek Bondugula, Verónica Álvarez, José I. Segovia-Martín et al.
Libraries for supervised classification have enabled the wide-spread usage of machine learning methods. Existing libraries, such as scikit-learn, caret, and mlpack, implement techniques based on the classical empirical risk minimization (ERM) approach. We present a Python library, MRCpy, that implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach. The library offers multiple variants of MRCs that can provide performance guarantees, enable efficient learning in high dimensions, and adapt to distribution shifts. MRCpy follows an object-oriented approach and adheres to the standards of popular Python libraries, such as scikit-learn, facilitating readability and easy usage together with a seamless integration with other libraries. The source code is available under the GPL-3.0 license at https://github.com/MachineLearningBCAM/MRCpy.
MLMay 15, 2023
Double-Weighting for Covariate Shift AdaptationJosé I. Segovia-Martín, Santiago Mazuelas, Anqi Liu
Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $\mathrm{p}_\text{tr}(x)$ and $\mathrm{p}_\text{te}(x)$ are different but the label conditionals coincide. Existing approaches address such covariate shift by either using the ratio $\mathrm{p}_\text{te}(x)/\mathrm{p}_\text{tr}(x)$ to weight training samples (reweighted methods) or using the ratio $\mathrm{p}_\text{tr}(x)/\mathrm{p}_\text{te}(x)$ to weight testing samples (robust methods). However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. In addition, we develop effective techniques that obtain both sets of weights and generalize the conventional kernel mean matching method. We provide novel generalization bounds for our method that show a significant increase in the effective sample size compared with reweighted methods. The proposed method also achieves enhanced classification performance in both synthetic and empirical experiments.