ML LG MEOct 14, 2019

Two-sample Testing Using Deep Learning

Matthias Kirchler, Shahryar Khorasani, Marius Kloft, Christoph Lippert

arXiv:1910.06239v214.749 citationsh-index: 35Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more efficient and accurate two-sample testing in domains like audio, images, and neuroimaging, representing an incremental improvement over existing kernel and classifier-based tests.

The paper tackles the problem of two-sample testing by proposing a deep learning-based procedure that uses learned neural network representations to define consistent test statistics with linear-time evaluation, resulting in significant reductions in type-2 error rates of up to 35 percentage points compared to state-of-the-art methods.

We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are consistent and asymptotically control the type-1 error rate. Their test statistics can be evaluated in linear time (in the sample size). Suitable data representations are obtained in a data-driven way, by solving a supervised or unsupervised transfer-learning task on an auxiliary (potentially distinct) data set. If no auxiliary data is available, we split the data into two chunks: one for learning representations and one for computing the test statistic. In experiments on audio samples, natural images and three-dimensional neuroimaging data our tests yield significant decreases in type-2 error rate (up to 35 percentage points) compared to state-of-the-art two-sample tests such as kernel-methods and classifier two-sample tests.

View on arXiv PDF Code

Similar