Stochastic Doubly Robust Gradient
This addresses the issue of handling missing data in machine learning for practitioners dealing with observational datasets, though it appears incremental by adapting existing doubly robust estimators to SGD.
The paper tackles the problem of biased parameter estimation and unfair decision outcomes in machine learning models trained on observational data with systematic missingness dependent on covariates. It proposes the Stochastic Doubly Robust Gradient (SDRG) method, which empirically shows convergence in training image classifiers with missing data examples.
When training a machine learning model with observational data, it is often encountered that some values are systemically missing. Learning from the incomplete data in which the missingness depends on some covariates may lead to biased estimation of parameters and even harm the fairness of decision outcome. This paper proposes how to adjust the causal effect of covariates on the missingness when training models using stochastic gradient descent (SGD). Inspired by the design of doubly robust estimator and its theoretical property of double robustness, we introduce stochastic doubly robust gradient (SDRG) consisting of two models: weight-corrected gradients for inverse propensity score weighting and per-covariate control variates for regression adjustment. Also, we identify the connection between double robustness and variance reduction in SGD by demonstrating the SDRG algorithm with a unifying framework for variance reduced SGD. The performance of our approach is empirically tested by showing the convergence in training image classifiers with several examples of missing data.