ML LGFeb 6, 2019

Un modèle Bayésien de co-clustering de données mixtes

Aichetou Bouchareb, Marc Boullé, Fabrice Rossi, Fabrice Clérot

arXiv:1902.02056v11.21 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of exploratory data analysis for large mixed-type datasets, offering a parameter-free co-clustering method, though it appears incremental in nature.

The authors tackled the problem of co-clustering mixed-type data tables by proposing a MAP Bayesian approach that infers optimal variable segmentation and minimizes a Bayesian model selection cost function, resulting in a user parameter-free method with an exact measure of model quality based on probability of fitting to data, as demonstrated in experiments on real data for exploratory analysis of large datasets.

We propose a MAP Bayesian approach to perform and evaluate a co-clustering of mixed-type data tables. The proposed model infers an optimal segmentation of all variables then performs a co-clustering by minimizing a Bayesian model selection cost function. One advantage of this approach is that it is user parameter-free. Another main advantage is the proposed criterion which gives an exact measure of the model quality, measured by probability of fitting it to the data. Continuous optimization of this criterion ensures finding better and better models while avoiding data over-fitting. The experiments conducted on real data show the interest of this co-clustering approach in exploratory data analysis of large data sets.

View on arXiv PDF

Similar