Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model
This work addresses computational bottlenecks in co-clustering for applications like gene expression analysis, but it is incremental as it adapts existing methods to a distributed setting.
The paper tackled the problem of scaling Bayesian non-parametric co-clustering by introducing a distributed MCMC inference method, achieving improved cluster labeling accuracy and reduced execution times in experiments.
In this paper, we introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM), employing the Master/Worker architecture. Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions. The workload on rows is evenly distributed among workers, who exclusively communicate with the master and not among themselves. DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results. Moreover, we present a real-use case applying our approach to co-cluster gene expression data. The code source is publicly available at https://github.com/redakhoufache/Distributed-NPLBM.