LG SPNov 10, 2024

Fitting Multiple Machine Learning Models with Performance Based Clustering

Mehmet Efe Lorasdagi, Ahmet Berker Koc, Ali Taha Koc, Suleyman Serdar Kozat

arXiv:2411.06572v24.61 citationsh-index: 26Has CodeIEEE Signal Processing Letters

Originality Incremental advance

AI Analysis

This addresses the issue of poor model performance in real-life data with multiple generating mechanisms, though it appears incremental as it builds on existing clustering and ensemble methods.

The paper tackles the problem of suboptimal performance from assuming a single data-generating mechanism by introducing a clustering framework that groups data based on feature-target relations and learns multiple separate models, showing significant improvements over traditional single-model approaches on real-life datasets.

Traditional machine learning approaches assume that data comes from a single generating mechanism, which may not hold for most real life data. In these cases, the single mechanism assumption can result in suboptimal performance. We introduce a clustering framework that eliminates this assumption by grouping the data according to the relations between the features and the target values and we obtain multiple separate models to learn different parts of the data. We further extend our framework to applications having streaming data where we produce outcomes using an ensemble of models. For this, the ensemble weights are updated based on the incoming data batches. We demonstrate the performance of our approach over the widely-studied real life datasets, showing significant improvements over the traditional single-model approaches.

View on arXiv PDF Code

Similar