MLLGMEFeb 18, 2023

Data Augmentation for Imbalanced Regression

arXiv:2302.09288v19 citationsh-index: 11
AI Analysis

This addresses biases in regression estimates for imbalanced data, particularly in actuarial contexts, but appears incremental as it builds on existing resampling and augmentation methods.

The paper tackles the problem of imbalanced data in regression, which can cause biased estimates, by proposing a data augmentation algorithm combining weighted resampling and data augmentation procedures to explore wider support and drive distributions to a target, with advantages illustrated through a numerical study and an actuarial application.

In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes