ML LGFeb 22, 2021

Divide-and-conquer methods for big data analysis

Xueying Chen, Jerry Q. Cheng, Min-ge Xie

arXiv:2102.10771v11.9

Originality Synthesis-oriented

AI Analysis

It addresses computational challenges in big data analysis for researchers and practitioners, but is incremental as it reviews existing developments.

The paper reviews divide-and-conquer methods for big data analysis, which split data into smaller sets for separate analysis and combine results to handle memory or computational limits, achieving statistical inference similar to full-data analysis.

In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-step process: first splitting a data set into several smaller ones; then analyzing each set separately; finally combining results from each analysis together. This approach is effective in handling large data sets that are unsuitable to be analyzed entirely by a single computer due to limits either from memory storage or computational time. The combined results will provide a statistical inference which is similar to the one from analyzing the entire data set. This article reviews some recently developments of divide-and-conquer methods in a variety of settings, including combining based on parametric, semiparametric and nonparametric models, online sequential updating methods, among others. Theoretical development on the efficiency of the divide-and-conquer methods is also discussed.

View on arXiv PDF

Similar