Bias Reduction via End-to-End Shift Learning: Application to Citizen Science
This addresses data bias problems in citizen science projects like eBird, where volunteer preferences skew data collection, though it appears incremental as an adaptation of covariate shift methods to this domain.
The paper tackles bias in citizen science data by proposing the Shift Compensation Network (SCN), an end-to-end learning scheme that quantifies and compensates for distribution shifts between scientific objectives and biased citizen-collected data. Applied to eBird bird observational data, SCN outperforms supervised learning models that ignore bias and shows advantages over competing covariate shift models in effectiveness and handling massive high-dimensional data.
Citizen science projects are successful at gathering rich datasets for various applications. However, the data collected by citizen scientists are often biased --- in particular, aligned more with the citizens' preferences than with scientific objectives. We propose the Shift Compensation Network (SCN), an end-to-end learning scheme which learns the shift from the scientific objectives to the biased data while compensating for the shift by re-weighting the training data. Applied to bird observational data from the citizen science project eBird, we demonstrate how SCN quantifies the data distribution shift and outperforms supervised learning models that do not address the data bias. Compared with competing models in the context of covariate shift, we further demonstrate the advantage of SCN in both its effectiveness and its capability of handling massive high-dimensional data.