Design-unbiased statistical learning in survey sampling
This work addresses a foundational gap in survey sampling by enabling more powerful machine-learning models, which is incremental as it builds on existing practices but introduces a novel theoretical framework.
The paper tackles the lack of a general theory for incorporating modern machine-learning techniques into design-consistent model-assisted estimation in survey sampling, proposing a subsampling Rao-Blackwell method that yields considerable efficiency gains over standard linear methods while ensuring robust and valid estimation.
Design-consistent model-assisted estimation has become the standard practice in survey sampling. However, a general theory is lacking so far, which allows one to incorporate modern machine-learning techniques that can lead to potentially much more powerful assisting models. We propose a subsampling Rao-Blackwell method, and develop a statistical learning theory for exactly design-unbiased estimation with the help of linear or non-linear prediction models. Our approach makes use of classic ideas from Statistical Science as well as the rapidly growing field of Machine Learning. Provided rich auxiliary information, it can yield considerable efficiency gains over standard linear model-assisted methods, while ensuring valid estimation for the given target population, which is robust against potential mis-specifications of the assisting model at the individual level.