MEMLJan 15, 2018

Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC

arXiv:1801.05007v1
Originality Incremental advance
AI Analysis

This is an incremental method for statisticians and data scientists dealing with big data computational challenges.

The authors tackled the problem of computing likelihood functions for large datasets by proposing a Divide & Recombine procedure that uses MCMC on subsets and recombines fitted densities, demonstrating it with logistic regression models.

In Divide & Recombine (D&R), big data are divided into subsets, each analytic method is applied to subsets, and the outputs are recombined. This enables deep analysis and practical computational performance. An innovate D\&R procedure is proposed to compute likelihood functions of data-model (DM) parameters for big data. The likelihood-model (LM) is a parametric probability density function of the DM parameters. The density parameters are estimated by fitting the density to MCMC draws from each subset DM likelihood function, and then the fitted densities are recombined. The procedure is illustrated using normal and skew-normal LMs for the logistic regression DM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes