LG NEDec 17, 2022

Two-sample test based on Self-Organizing Maps

Alejandro Álvarez-Ayllón, Manuel Palomo-Duarte, Juan-Manuel Dodero

arXiv:2212.08960v1h-index: 23

Originality Synthesis-oriented

AI Analysis

This addresses the need for interpretable two-sample tests in statistics and machine learning, offering a solution for users who require both statistical inference and visualization, though it is incremental as it adapts an existing visualization tool to a known testing framework.

The paper tackles the problem of two-sample statistical testing using machine-learning classifiers, which often lack interpretability, by proposing a method based on Self-Organizing Maps that not only tests if samples come from different populations but also provides visual insights into their differences.

Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights.

View on arXiv PDF

Similar