ST ME MLApr 20, 2012

Modeling, dependence, classification, united statistical science, many cultures

arXiv:1204.4699v313 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for a unified statistical science that bridges multiple cultures, including parametric, algorithmic, and nonparametric approaches, for researchers in statistics and data science, though it appears incremental in extending existing methods.

The paper tackles the problem of unifying diverse statistical methods for both small and big data sets by introducing a framework based on comparison density, copula density, and new measures like LP score comoments, which handle long-tailed distributions and unify discrete and continuous variables, extending these to high-dimensional data modeling.

Breiman (2001) proposed to statisticians awareness of two cultures: 1. Parametric modeling culture, pioneered by R.A.Fisher and Jerzy Neyman; 2. Algorithmic predictive culture, pioneered by machine learning research. Parzen (2001), as a part of discussing Breiman (2001), proposed that researchers be aware of many cultures, including the focus of our research: 3. Nonparametric, quantile based, information theoretic modeling. We provide a unification of many statistical methods for traditional small data sets and emerging big data sets in terms of comparison density, copula density, measure of dependence, correlation, information, new measures (called LP score comoments) that apply to long tailed distributions with out finite second order moments. A very important goal is to unify methods for discrete and continuous random variables. Our research extends these methods to modern high dimensional data modeling.

View on arXiv PDF

Similar