Exact Distribution-Free Hypothesis Tests for the Regression Function of Binary Classification via Conditional Kernel Mean Embeddings
This provides a rigorous statistical tool for researchers and practitioners in machine learning to validate regression functions in classification tasks, though it is incremental as it builds on existing kernel embedding methods.
The paper tackles the problem of testing the regression function in binary classification by proposing two distribution-free hypothesis tests using conditional kernel mean embeddings, which control the exact type I error probability for any sample size and are proven to be consistent with type II errors converging to zero.
In this paper we suggest two statistical hypothesis tests for the regression function of binary classification based on conditional kernel mean embeddings. The regression function is a fundamental object in classification as it determines both the Bayes optimal classifier and the misclassification probabilities. A resampling based framework is presented and combined with consistent point estimators of the conditional kernel mean map, in order to construct distribution-free hypothesis tests. These tests are introduced in a flexible manner allowing us to control the exact probability of type I error for any sample size. We also prove that both proposed techniques are consistent under weak statistical assumptions, i.e., the type II error probabilities pointwise converge to zero.