A Statistical Test for Joint Distributions Equivalence
This provides a method for verifying dataset-shift in machine learning, which is incremental as it builds on existing kernel tests.
The paper tackles the problem of determining if two joint distributions are statistically different using a distribution-free test, extending kernel two-sample tests to joint distributions and enabling verification of dataset-shift in learning frameworks without assumptions about the shift type.
We provide a distribution-free test that can be used to determine whether any two joint distributions $p$ and $q$ are statistically different by inspection of a large enough set of samples. Following recent efforts from Long et al. [1], we rely on joint kernel distribution embedding to extend the kernel two-sample test of Gretton et al. [2] to the case of joint probability distributions. Our main result can be directly applied to verify if a dataset-shift has occurred between training and test distributions in a learning framework, without further assuming the shift has occurred only in the input, in the target or in the conditional distribution.