LGNEDec 17, 2022

Two-sample test based on Self-Organizing Maps

arXiv:2212.08960v1h-index: 23
Originality Synthesis-oriented
AI Analysis

This addresses the need for interpretable two-sample tests in statistics and machine learning, offering a solution for users who require both statistical inference and visualization, though it is incremental as it adapts an existing visualization tool to a known testing framework.

The paper tackles the problem of two-sample statistical testing using machine-learning classifiers, which often lack interpretability, by proposing a method based on Self-Organizing Maps that not only tests if samples come from different populations but also provides visual insights into their differences.

Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes