Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets
This provides theoretical guarantees for practical MCMC-based learning in exponential families, addressing a bottleneck in machine learning for researchers and practitioners working with intractable models.
The paper tackles the challenge of maximum likelihood learning in high-treewidth models by proving that gradient descent with MCMC-sampled gradients approximates the solution when parameters are constrained to fast-mixing sets, achieving accuracy with cubic or quadratic effort in 1/epsilon for unregularized and regularized cases, respectively.
Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution. While it is common in practice to approximate the likelihood gradient using samples obtained from MCMC, such procedures lack theoretical guarantees. This paper proves that for any exponential family with bounded sufficient statistics, (not just graphical models) when parameters are constrained to a fast-mixing set, gradient descent with gradients approximated by sampling will approximate the maximum likelihood solution inside the set with high-probability. When unregularized, to find a solution epsilon-accurate in log-likelihood requires a total amount of effort cubic in 1/epsilon, disregarding logarithmic factors. When ridge-regularized, strong convexity allows a solution epsilon-accurate in parameter distance with effort quadratic in 1/epsilon. Both of these provide of a fully-polynomial time randomized approximation scheme.