MEMLJan 7, 2016

Measuring and Discovering Correlations in Large Data Sets

arXiv:1602.07960v1
Originality Incremental advance
AI Analysis

This addresses the challenge of identifying correlations in large datasets for data analysts, though it appears incremental as it builds on and compensates for limitations in existing methods like Reshef's model.

The authors tackled the problem of measuring and discovering correlations in large datasets by proposing ART, a class of statistics that efficiently and equitably evaluates a wide range of linear and nonlinear bi-variable correlations without prior knowledge of relationship types, and applied it to a dataset of 10 American classical indexes to discover many correlations.

In this paper, a class of statistics named ART (the alternant recursive topology statistics) is proposed to measure the properties of correlation between two variables. A wide range of bi-variable correlations both linear and nonlinear can be evaluated by ART efficiently and equitably even if nothing is known about the specific types of those relationships. ART compensates the disadvantages of Reshef's model in which no polynomial time precise algorithm exists and the "local random" phenomenon can not be identified. As a class of nonparametric exploration statistics, ART is applied for analyzing a dataset of 10 American classical indexes, as a result, lots of bi-variable correlations are discovered.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes