Kernel Machines With Missing Responses
This work addresses missing data issues in regression and classification for researchers and practitioners, but it appears incremental as it builds on existing kernel machine methods.
The authors tackled the problem of missing responses in data by developing kernel machines that handle such cases, achieving consistency and convergence rates proven through oracle inequalities.
Missing responses is a missing data format in which outcomes are not always observed. In this work we develop kernel machines that can handle missing responses. First, we propose a kernel machine family that uses mainly the complete cases. For the quadratic loss, we then propose a family of doubly-robust kernel machines. The proposed kernel-machine estimators can be applied to both regression and classification problems. We prove oracle inequalities for the finite-sample differences between the kernel machine risk and Bayes risk. We use these oracle inequalities to prove consistency and to calculate convergence rates. We demonstrate the performance of the two proposed kernel machine families using both a simulation study and a real-world data analysis.