LG AIApr 21, 2021

Understanding and Accelerating EM Algorithm's Convergence by Fair Competition Principle and Rate-Verisimilitude Function

arXiv:2104.12592v14 citations

Originality Incremental advance

AI Analysis

This addresses convergence difficulties in EM algorithms for mixture models, which is an incremental improvement for statistical and machine learning practitioners.

The paper tackles the convergence behavior of the EM algorithm for mixture models, proving that the complete data log-likelihood Q may need to decrease for the observed data log-likelihood L to increase, contrary to prior theories, and proposes a Fair Competition Principle with an initialization map that vastly reduces running times for binary Gaussian mixtures.

Why can the Expectation-Maximization (EM) algorithm for mixture models converge? Why can different initial parameters cause various convergence difficulties? The Q-L synchronization theory explains that the observed data log-likelihood L and the complete data log-likelihood Q are positively correlated; we can achieve maximum L by maximizing Q. According to this theory, the Deterministic Annealing EM (DAEM) algorithm's authors make great efforts to eliminate locally maximal Q for avoiding L's local convergence. However, this paper proves that in some cases, Q may and should decrease for L to increase; slow or local convergence exists only because of small samples and unfair competition. This paper uses marriage competition to explain different convergence difficulties and proposes the Fair Competition Principle (FCP) with an initialization map for improving initializations. It uses the rate-verisimilitude function, extended from the rate-distortion function, to explain the convergence of the EM and improved EM algorithms. This convergence proof adopts variational and iterative methods that Shannon et al. used for analyzing rate-distortion functions. The initialization map can vastly save both algorithms' running times for binary Gaussian mixtures. The FCP and the initialization map are useful for complicated mixtures but not sufficient; we need further studies for specific methods.

View on arXiv PDF

Similar