ME MLJan 2, 2021

Measure of Strength of Evidence for Visually Observed Differences between Subpopulations

Xi Yang, Jan Hannig, Katherine A. Hoadley, Iain Carmichael, J. S. Marron

arXiv:2101.00362v31.2Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of reliably quantifying visually observed subpopulation differences for researchers analyzing complex datasets, such as modern cancer data, where traditional statistical methods may be inadequate.

This paper proposes the Population Difference Criterion (PDC) to assess the statistical significance of visually observed subpopulation differences, particularly in high-dimensional and high-signal contexts where traditional methods struggle. It also introduces a more powerful balanced permutation approach and quantifies uncertainty using bootstrap confidence intervals, demonstrating its utility with cancer data.

For measuring the strength of visually-observed subpopulation differences, the Population Difference Criterion is proposed to assess the statistical significance of visually observed subpopulation differences. It addresses the following challenges: in high-dimensional contexts, distributional models can be dubious; in high-signal contexts, conventional permutation tests give poor pairwise comparisons. We also make two other contributions: Based on a careful analysis we find that a balanced permutation approach is more powerful in high-signal contexts than conventional permutations. Another contribution is the quantification of uncertainty due to permutation variation via a bootstrap confidence interval. The practical usefulness of these ideas is illustrated in the comparison of subpopulations of modern cancer data.

View on arXiv PDF Code

Similar