LG MLJun 13, 2012

Constrained Approximate Maximum Entropy Learning of Markov Random Fields

Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller

arXiv:1206.3257v137 citations

Originality Highly original

AI Analysis

This work addresses the challenge of efficient and accurate MRF parameter estimation for researchers and practitioners in machine learning, offering a novel framework that improves upon existing approximate methods.

The paper tackles the difficult parameter estimation problem in Markov random fields by introducing a constrained approximate maximum entropy learning approach that combines MRF learning with Bethe approximation, allowing for parameter sharing, regularization, and conditional training. The results show that the proposed algorithms significantly outperform learning with loopy belief propagation and piecewise training on several real-world networks.

Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief propagation (LBP) can suffer from poor convergence. In this paper, we provide a different approach for combining MRF learning and Bethe approximation. We consider the dual of maximum likelihood Markov network learning - maximizing entropy with moment matching constraints - and then approximate both the objective and the constraints in the resulting optimization problem. Unlike previous work along these lines (Teh & Welling, 2003), our formulation allows parameter sharing between features in a general log-linear model, parameter regularization and conditional training. We show that piecewise training (Sutton & McCallum, 2005) is a very restricted special case of this formulation. We study two optimization strategies: one based on a single convex approximation and one that uses repeated convex approximations. We show results on several real-world networks that demonstrate that these algorithms can significantly outperform learning with loopy and piecewise. Our results also provide a framework for analyzing the trade-offs of different relaxations of the entropy objective and of the constraints.

View on arXiv PDF

Similar