ML LG MEDec 18, 2019

Boltzmann Exploration Expectation-Maximisation

arXiv:1912.08869v13.23 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a general challenge in mixture model fitting for researchers and practitioners, offering a method that is less sensitive to initialization, though it appears incremental as it builds on existing expectation-maximization frameworks.

The authors tackled the problem of fitting finite mixture models, which often suffer from sensitivity to parameter initialization, by proposing BEEM, a stochastic learning algorithm that uses Boltzmann exploration for cluster assignments. They demonstrated competitive performance on synthetic benchmarks and real-world datasets.

We present a general method for fitting finite mixture models (FMM). Learning in a mixture model consists of finding the most likely cluster assignment for each data-point, as well as finding the parameters of the clusters themselves. In many mixture models, this is difficult with current learning methods, where the most common approach is to employ monotone learning algorithms e.g. the conventional expectation-maximisation algorithm. While effective, the success of any monotone algorithm is crucially dependant on good parameter initialisation, where a common choice is $K$-means initialisation, commonly employed for Gaussian mixture models. For other types of mixture models, the path to good initialisation parameters is often unclear and may require a problem-specific solution. To this end, we propose a general heuristic learning algorithm that utilises Boltzmann exploration to assign each observation to a specific base distribution within the mixture model, which we call Boltzmann exploration expectation-maximisation (BEEM). With BEEM, hard assignments allow straight forward parameter learning for each base distribution by conditioning only on its assigned observations. Consequently, it can be applied to mixtures of any base distribution where single component parameter learning is tractable. The stochastic learning procedure is able to escape local optima and is thus insensitive to parameter initialisation. We show competitive performance on a number of synthetic benchmark cases as well as on real-world datasets.

View on arXiv PDF Code

Similar