LGSPNov 10, 2024

Fitting Multiple Machine Learning Models with Performance Based Clustering

arXiv:2411.06572v21 citationsh-index: 28IEEE Signal Processing Letters
Originality Incremental advance
AI Analysis

This addresses the issue of poor model performance in real-life data with multiple generating mechanisms, though it appears incremental as it builds on existing clustering and ensemble methods.

The paper tackles the problem of suboptimal performance from assuming a single data-generating mechanism by introducing a clustering framework that groups data based on feature-target relations and learns multiple separate models, showing significant improvements over traditional single-model approaches on real-life datasets.

Traditional machine learning approaches assume that data comes from a single generating mechanism, which may not hold for most real life data. In these cases, the single mechanism assumption can result in suboptimal performance. We introduce a clustering framework that eliminates this assumption by grouping the data according to the relations between the features and the target values and we obtain multiple separate models to learn different parts of the data. We further extend our framework to applications having streaming data where we produce outcomes using an ensemble of models. For this, the ensemble weights are updated based on the incoming data batches. We demonstrate the performance of our approach over the widely-studied real life datasets, showing significant improvements over the traditional single-model approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes