AP CO ME MLSep 20, 2019

A clusterwise supervised learning procedure based on aggregation of distances

Aurélie Fisher, Mathilde Mougeot, Sothea Has

arXiv:1909.09370v31.2

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of fitting predictive models in clustered data for machine learning practitioners, but it appears incremental as it builds on existing clustering and aggregation techniques.

The authors tackled the problem of supervised learning when input data contains multiple unknown clusters with different underlying predictive models, proposing a three-step procedure called KFC that aggregates models adaptively and demonstrated excellent performance on simulated and real data.

Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, and when different underlying predictive models exist, fitting a model is a more challenging task. We propose, in this paper, a procedure in three steps to automatically solve this problem. The KFC procedure aggregates different models adaptively on data. The first step of the procedure aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. It provides several partitions, given the assumptions on the distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.

View on arXiv PDF

Similar