LGDBAug 24, 2023

An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression

arXiv:2308.12691v13 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses efficiency challenges in big data analysis for researchers and practitioners, though it is incremental as it builds on existing regression techniques with a novel partitioning approach.

The paper tackles the problem of analyzing big data by proposing a multiple-model linear regression (MMLR) method that partitions datasets into subsets and builds local linear models, achieving linear time complexity and comparable prediction accuracy with significantly faster execution times than existing regression methods.

This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of them. The proposed data analysis method is shown to be more efficient and flexible than other regression based methods. This paper also proposes an approximate algorithm to construct MMLR models based on $(ε,δ)$-estimator, and gives mathematical proofs of the correctness and efficiency of MMLR algorithm, of which the time complexity is linear with respect to the size of input datasets. This paper also empirically implements the method on both synthetic and real-world datasets, the algorithm shows to have comparable performance to existing regression methods in many cases, while it takes almost the shortest time to provide a high prediction accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes