General-Purpose MCMC Inference over Relational Structures
This work addresses the problem of reducing implementation effort for researchers and practitioners in probabilistic inference over relational structures, though it is incremental as it builds on existing MCMC and modeling language techniques.
The paper tackles the challenge of implementing efficient MCMC inference for relational tasks like record linkage by proposing a general-purpose approach using a probabilistic modeling language and a generic Metropolis-Hastings algorithm with partial world states, showing favorable results compared to an application-specific system in citation matching experiments.
Tasks such as record linkage and multi-target tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carlo (MCMC) techniques with customized proposal distributions. Currently, implementing such a system requires coding MCMC state representations and acceptance probability calculations that are specific to a particular application. An alternative approach, which we pursue in this paper, is to use a general-purpose probabilistic modeling language (such as BLOG) and a generic Metropolis-Hastings MCMC algorithm that supports user-supplied proposal distributions. Our algorithm gains flexibility by using MCMC states that are only partial descriptions of possible worlds; we provide conditions under which MCMC over partial worlds yields correct answers to queries. We also show how to use a context-specific Bayes net to identify the factors in the acceptance probability that need to be computed for a given proposed move. Experimental results on a citation matching task show that our general-purpose MCMC engine compares favorably with an application-specific system.